Computation of P values for Halley's prediction of the comet of 1758

Supplementary Material
for
Letter (Accommodation or Prediction?)
Science 308: 1410
John Aach and George M. Church
Department of Genetics, Harvard Medical School

Last modified: June 8, 2005

Background
Methods
Additional results and discussion
Download halleyrnd
hallyrnd usage notes
References
Contact us
Copyright

Background

When a scientific hypothesis is confirmed, the data confirming it may either have been gathered before or after the hypothesis was formulated. The latter is called a prediction and the former an accommodation of the data. Peter Lipton (1) has recently argued that predictions carry more evidential weight than accommodations because there is more opportunity for investigators to 'fudge' the data in accommodations than predictions. We believe this emphasis on fudging is misplaced on two counts. First, if accommodation offers increased opportunity, there remains considerable latitude for fudging the confirmation of predictions when relevant observations are subject to choice and interpretation. Erroneously confirmed predictions figure prominently in problematic science (2) such as N-rays (3,4) and cold fusion (5,6). Second, the emphasis on fudging overlooks the powerful ability of statistical analysis to evaluate the degree of confirmation of both predictions and accommodations. In our Letter to Science, we explore the latter by estimating P values for predictions and accommodations associated with a famous historical prediction mentioned by Lipton: Edmond Halley's hypothesis, published in 1705, that cometary appearances in 1531, 1607, and 1682 were all from the same comet, and that the comet would return in 1758. Halley's prediction was subsequently confirmed. Here we describe how we estimated P values for Halley's prediction and for corresponding accommodations.

We note that our calculation of P values is intended only to illustrate how statistics may be used to evaluate the relative degrees of support that may be accorded to predictions vs. accommodations in the context of a historically famous prediction, but is not intended to be a complete statistical analysis of Halley's actual prediction taking into account all astronomical details. Specifically, only the near periodicity of cometary arrivals is considered in P value calculations and not orbital characteristics or the relationship between deviations from pure periodicity and perturbations due to passage near planets. (See below for a discussion of the impact of these assumptions, however.)

Methods

Halley considered 24 comets that had appeared between 1337 and 1698 and found three separated by 75 or 76 year intervals with similar orbital characteristics (7). His prediction of a comet of 1758 was based on the assumption that the orbit was elliptical and that the comet should therefore reappear ~76 years from its last visit in 1682 (see notes). We computed a P value for this and similar predictions relative to the null hypothesis that comets appeared at random with historic rates using the MatLab m-file "halleyrnd.m" (download here). To compute a P value for Halley's prediction,

Using Poisson statistics, halleyrnd.m first randomly generates the number of comets that would be expected to appear in a 362 year period from historic data on comet observation rates. halleyrnd.m then randomly generates that number of cometary appearances over the 362 period using uniform distributions. Using simple assumptions, we attempted to model Halley's presumed selection bias for which observed comets would be part of his sample into the cometary observation rates (see notes below).
halleyrnd.m then randomly generates cometary appearances in the period subsequent to the 362 year observaton period using the same cometary observation rate information.
halleyrnd.m then searches the appearances within the 362 year observation period and looks for all sets of 3 or more comets that appear to be in sequence, and for which the next member of the sequence would be beyond the 362 observation period. If no such sequence is found, the current simulation is rejected and another is generated starting at step 1. In determining whether appearances are in a sequence, individual cometary appearances are allowed to deviate from precise sequence positions by a specified error tolerance (see notes below).
For each simulation of cometary appearances (1 and 2 above) for which one or more sequences (3 above) are detected, halleyrnd.m then checks to see if any of the predictions for the post-observation period are within the error tolerance of a randomly generated appearance in this period. If any prediction meets this criteria, a 'confirmed' prediction is counted.
halleyrand.m iterates until a specified number of simulations that generate predictions (3 above) is reached, and reports the ratio of confirmed predictions to generated predictions as the P value of prediction.

halleyrnd.m is also used to generate P values for accommodative confirmations of a hypothesis of cometary return. Halley's confirmed prediction involves finding 4 cometary appearances in sequence, the first 3 within the initial observation period and the 4th subsequent observation. A corresponding accommodative confirmation also involves finding 4 cometary appearances in sequence in a specified observation period. To compute a P value for such an accommodation, halleyrnd.m goes through steps 1 and 2 above to simulate cometary appearances, and searches the appearance set for a sequence as in step 3 above. However, in this case, halleyrnd.m simply counts any simulation with a sequence of 4 appearances as successful, performs a specified number of iterations, and reports the ratio of successful to total iterations as the P value for accommodation. There is no need to reject iterations when computing P values for accommodation compared to prediction (see step 3 above). This is because the probabilities computed for accommodation are of the form P(A,B), and those for prediction of the form P(A|B), where B = first 3 observations in sequence and A = 4^th observation in the sequence of B. To estimate probabilities for prediction, we must focus on only those simulations which meet the condition B, and count the number of those simulations that also meet A. To compute P(A|B) using nn trials therefore involves open-ended iterations of comet simulation that continue until nn trials meeting condition B have been found. For accommodation, however, no such constraints apply to comet simulations, and a probability estimate of P(A,B) using nn trials needs only nn simulations.

However, a subtle difference between predictions and accommodations must be taken into account in simulating 'corresponding' accommodations. Predictions start from a fixed observation period and, based on them, generate future predictions that are beyond them, but accommodations consider only events within a fixed period and never look beyond that period. What observation period should be assigned to an accommodation to 'correspond' to Halley's prediction? If we simply look for accommodations within Halley's initial 362 year observation period, accommodations will have smaller sets of cometary appearances in which to find a sequence of 4+ appearance than do predictions which look beyond the 362 years. On the other hand, if we extend the observation period beyond 362 years for accommodations, they will tend to have more cometary appearances to consider than do predictions, as the latter only look at a single appearance beyond the 362 year period. We dealt with this issue simply by considering different sized extensions to the 362 period when computing accommodation P values: (i) the period beyond the 362 years that corresponds to the actual reappearance of Halley's comet in 1758 (60 years beyond 1698), (ii) the average extension beyond the 362 that corresponds to randomly confirmed predictions of cometary return (found to be ~77.9 years), (iii) an arbitrary short period (10 years).

P values reported in our Letter were based on at least 3 runs of 10000 trials each (i.e. 10000 simulations for accommodations, and 10000 simulations passing the constraint in step 3 for predictions), with one exception: The P value for a prediction of a 5^th appearance based on 4 observations was performed with only 3000 trials because of the very high rejection rate in step 3 (to find 3000 simulations with an appropriate sequence of 4 required > 282600 iterations). Since halleyrnd.m is based on random number generation, different simulations will result in different P value estimates. We found that multiple halleyrnd.m runs with nn=10000 performed for the same predictions and accommodations generally produced P values within .01 of each other. P values presented in the Letter were based on aggregates of all runs done for the given parameters rounded to two decimal places. In addition to the simulations described above based on Halley's 362 year observation period, we also generated P value estimates on the assumption of an observation period limited to the 152 year period corresponding to 1531-1682, the first and third observations of the returning comet considered by Halley. These simulations were performed the same way as those described above except for the different periods involved (see notes for details). Simulation data from the individual runs used to generate P values reported in the Letter or on this web site may be found here.

DETAILED NOTES

Cometary appearance rates: We used an on-line database of cometary appearances based on the Catalog of Cometary Orbits produced by Brian G. Marsden and Gareth V. Williams to identify all comets which were observed between 1337 and 1531, finding 62. Our first assumption is that the subset of 24 which Halley used was based on availability of records and information that would have allowed him to estimate orbital characteristics of the comets. Our second assumption is that all such information would have been more likely to have been available for later comet appearances than earlier ones, due to increasing scientific interest and observational capability.

Analysis of the 62 cometary appearances showed that they appeared at a higher rate towards the end of the period than the beginning, consistent with this assumption of greater scientific interest and capability. For simplicity, we divided the 362 period into an initial 262 year period, during which the rate of recorded cometary appearances was 0.13/yr, and a last 100 year period, during which this rate was 0.18/yr, and assumed that Halley's selection of comets was 24/62 of these rates in these periods, i.e., that Halley considered comets at the rate of 0.06/yr for the period between 1337 and 1598, and at the rate 0.08/yr for the period between 1599 and 1698. A spreadsheet documenting our estimates may be found here. Note that we did not attempt to use Halley's actual list of 24 comet observations (7) in estimating the rates for these two periods.
As explained above, prediction and accommodation both involve simulation of cometary appearances after the 362 year period. We used the .08/yr rate of cometary appearances for all such simulations.
All period lengths defined to halleyrnd.m are assumed to be in months, and all P value estimates assume an error tolerance for sequences of 1 month. This means that when computing random cometary sequences from a set of randomly generated appearances, all pairs of two appearances a₁ and a₂ with appearance times t₁ < t₂ are considered, and subsequent appearances that are within +/- 1 month of t₁ + k*(t₂-t₁) for k = 1, 2, ... are assumed to be in the sequence.

Actual cometary returns are greatly affected by gravitational and orbital circumstances, and since we did not model these in our P value estimations, we used a correspondingly simple assumption concerning tolerances. Halley's prediction of the perihelion of the comet of 1758 used orbital data and gravitational physics, taking into account the perturbing effects of Jupiter. His prediction of the perihelion of the comet of 1758 was 3 months off, but refined calculations by Clairaut that also took Saturn into account were only one month off (see 7). Similarly, the perihelions associated with the comet in 1531, 1607, and 1682 were actually separated by 914 and 899 months (based on data from here). For an exact 76 year period, there would be 912 months between successive perihelions.

Additional results and discussion

A complete list of the results of all of the simulations used in our Letter and on this web site may be found here. Here we provide some general observations that amplify on results cited in the Letter and in this list of simulation results.

The P value of a prediction is computed from the likelihood, by chance, that a random sequence of k observations found in the observation period, when extended to element k+1 outside the period, is found in randomly simulated comet appearance data, within sequence tolerances. Since the occurrence of the predicted appearance in the random post-observation data is independent of the random observation data used to predict it, and the randomly generated data follows a uniform distribution, the probability of a confirmed prediction should be independent of k. However, the probability of a 'corresponding' accommodation based on a sequence of k+1 appearances (see above) should decrease with both k and the length of the observation period because (i) longer sequences are always extensions of shorter sequences, so that the likelihood of finding a sequence of k+1 is necessarily smaller than that of finding a sequence of k, and because (ii) given fixed cometary appearance rates, shorter observation periods will tend to contain fewer appearances and therefore less opportunity to form sequences of fixed length k+1. Here we consider how this prediction of the behavior of P values compares with our simulations. In particular, we observe:

The P value of accommodations based on observation periods of 362 + x years and 4 observations was found to decrease with x. For x = 935 months = 77.9 years (the average extension of randomly confirmed predictions beyond a 362 year observation period for Halley's prediction), the P value is ~0.12. For x = 60 years (the extension beyond the 362 period corresponding to Halley's actually predicted return of the comet in 1758), the P value is ~0.11. For x = 10 years (an arbitrary short extension period), the P value is ~0.09.
We also considered predictions and accommodations based on an observation period defined by the actual three appearances of Halley's comet in his list of 24 on which he based his prediction of 1758. From the argument above, we should expect that the P value of predictions made based on this 152 year subset of Halley's full 362 year observation period should be the same as that for the full 362 period, while P values for accommodations based on 152 + x year periods should be smaller than those based on 362 + x year periods containing them. In fact we find the P value for prediction based on this 152 year observation period drops slightly from ~0.03 to ~0.02, while the P value for an accommodation based on a 152 + 76 year period (a subset of the 362 + 60 year period described in a above) drops to ~0.08 from ~0.11.
Finally, using Halley's full 362 year observation period, we considered predictions of a 5^th appearance based on 4 observations in sequence, and accommodations over 362 + 60 years based on 5 observations. Again, the P value of the prediction dropped to ~0.02 from the ~0.03 P value for the prediction based on 362 years in a above, while the P value for accommodation for 362 + 60 years dropped from the ~.011 value in a above to ~0.08.

all

above

Effect of taking orbital characteristics and gravitational theory into account: While, as noted, our P value computations do not take orbital charactistics and gravitational perturbations into account, general considerations suggest that doing so should not greatly change the relative strengths attributed to prediction vs. accommodation by statistical analysis, so long as the null hypothesis is that orbital characteristics are independent of the time of appearance. We consider two aspects:
- If we start with the simple hypothesis that the probability that two comet appearances have the same orbital characteristics by chance is q, then the P values for prediction and accommodation associated with a sequence of k cometary appearances are both adjusted by a common factor of q^k, so that both the relative order and the relative size of these P values are unchanged. This assumes the simplest possible null distribution for cometary orbital characteristics, whereby orbital characteristics follow a uniform distribution in which the probability that a comet is in any one orbital class is q. However, assuming a more realistic distribution of cometary orbit characteristics in which there are C classes of orbit, and where the probability of a comet being in class i is q_i, the probability that a sequence of k comets are all in the same orbital class is simply q' = Sum_i=1..C[q_i^k], and this factor q' again applies equally to prediction and accommodation.
- Now we consider the case in which returning comets are not assumed to reappear with a constant period, but where instead successive appearance times are variably offset from a strictly periodic sequence due to perturbations by planets. We may represent the appearance times of such a comet by a sequence t₀, t₀ + r + o₁, t₀ + 2r + o₂, ..., where r represents the constant period, and o₁, o₂, ... represents the sequence of offsets from strictly periodic behavior that are due the effects of gravitational perturbations for successive returns. We may assume that gravitational theory and astronomical observations allow the specification of some rule for the sequence of the o_i. Then, assuming such a rule, a prediction using this rule from observations provided in an observation period will still result in the specification of a particular cometary arrival time after the end of this period. Given our null distribution assumption of a uniform cometary arrival rate in the post-observation period, the probability that this prediction will be successful depends only on this rate and the error tolerance for successful prediction, and this probability will be the same as it would be if the prediction were based on a purely periodic sequence. Similarly, the probability of a successful accommodation of a sequence of k appearances based on the rule will be found given the null assumption of uniform and random appearance times will depend only on the rate of appearance and error tolerances, and should be the same as the random probability of appearance of a strictly periodic sequence.
  
  The gist of this argument is that our P value calculations should come out the same given the null hypothesis of random appearance regardless of whether we assume that comets return with a strict period, or that they appear in a sequence that is determined by some set rule. In this way, the relative P values for prediction and accommodation that are estimated under the assumption of strict periodicity should be the same P values that should result when gravitational theory and planetary perturbation are taken into account. However, the important qualification is that a single completely specified rule has been considered for both prediction and accommodation. If multiple rules are considered or rules contain adjustable parameters, P values must be adjusted to take multiple hypotheses into account. Depending on circumstances, this could have differential effects on prediction vs. accommodation.
In 2. above we observed that correct estimation of P values depends on the need for a complete and correct specification of the hypothesis under study in relation to the assumed null distribution. Here we note that this relates to Peter Lipton's claims about fudging, in that where rules are incompletely or incorrectly specified, it is possible to be unaware of or ignore multiple hypotheses or adjustable parameters, and this effect can lead to incorrect estimates of the degree of validity of confirmations. For instance, if several hypotheses were considered in the course of developing an accommodative hypothesis, but only one that is confirmed is acknowledged and reported, the P value for this accommodative confirmation should be judged in the context of the original full set of multiple hypotheses, and since these are unacknowledged and unreported the degree of validity of the confirmation will be overestimated. The greater opportunity present in accommodations compared to predictions, for considering and rejecting hypothesis that are unconfirmed before finding and reporting one that is confirmed, may largely account for the lesser validity that Peter Lipton attributes to accommodations.

However, predictions are not immune from biased assessments of validity due to incorrect and incomplete specifications of parameters and hypotheses. For instance, in assessing the P value of Halley's prediction, it would be easy, but incorrect, to focus on the 152 years between 1531 and 1682, the dates of the first and third actual observations of the comet, as defining the observation period from which the P value of his prediction should be judged. However, this would be an incorrect null hypothesis from which to judge the value of his confirmed prediction, because Halley searched a larger 362 year observation period for comet appearances that followed an approximate sequence, and, as we noted above, there are reasons why the P value for prediction may drop when based on shorter observation periods, making the value of the confirmation seem larger than it should be. Although in this case we observed only a small effect from this incorrect null, changes in the way the problem is specified could have larger effects. E.g., if we modified the null hypothesis that cometary appearances follow a uniform distribution in the post-observation period, and assumed, rather, that the probability of observing a comet at a point in time steadily increases as the post-observation period progresses (a different, but reasonable, way of modeling the historic trend towards higher rates of comet observations in Halley's time than assumed above), the assumption of 152 year observation period would further depress P value estimates compared to those estimated more correctly from the 362 year period. The general rule is that care must be taken to completely and correctly specify hypotheses and null assumptions for predictions as well as accommodations, and that to the extent that this is not done, predictions, like accommodations, can be subject to fudging.

We note, finally, that this consideration underscores the limitations of distinguishing prediction and accommodation in terms of the time relation of an observer to confirming observations -- i.e., that for the former one or more confirming observations are in the observer's future while for the latter all are in the past. Lipton makes reference to this mode of distinguishing prediction and accommodation in the introduction to (1), where he summarizes arguments that purport to demonstrate why prediction and accommodation should be equivalent based on the irrelevancy of whether observations are in an observer's future or past. However, statistically, the entire range and time interval of attempted observation is relevant to judging the degree of validity of the confirmations, not merely the confirming observations, the particular observations used to formulate the hypotheses, and their times. This provides an additional argument for rejecting the distinction between prediction and accommodation based on observation times that is not mentioned by Lipton in his criticism of this conception.

Download halleyrnd

You may download the halleyrnd.m MatLab m-file here

halleyrnd usage notes

The syntax for running halleyrnd.m in MatLab is

pval=halleyrnd(mode_str,nn,uu,hh,ee,pp,logfile)

'pval' is the P value returned by the simulation. Parameters are as follows:

mode_str is 'predict' or 'accommodate' or any prefix ('p' or 'a' will do). Default = 'p'.
nn is the number of trials that will be performed (see above). As noted, for prediction, nn trials will typically require >> iterations of simulation of cometary appearances, but for accommodation, nn trials will entail exactly nn iterations. Default = 1000.
uu defines the period of observation for predictions and associated cometary appearance rates. For accommodations, the observation period defined by uu is extended as specified by the pp parameter (below). uu has the syntax [rate1 period1 ; rate2 period 2], where rate1 and rate2 are rates of cometary appearances / year and period1 and period2 are the lengths of the corresponding periods in months. Default = [ .06 3144 ; .08 2200], which corresponds to the estimates of rates applicable to Halley's actual 362 year observation period (see notes).
hh is the sequence 'hit count' for prediction or accommodation within the observation period. For prediction, hh defines the number of appearances in sequence that must be found in the observation period; for an iteration to be accepted, the next element of the sequence of hh must additionally be beyond the observation period. For accommodation, all iterations are accepted, and any that contain a sequence of hh appearances is considered a successful trial. Default = 3, which corresponds to the hit count for the observation period associated with Halley's prediction. To run a corresponding accommodation, hh should be set to 4.
ee is the error tolerance for deviation of appearance times from an exact sequence. Default= 1 (see notes).
pp defines the post-observation period. pp has the form [rate period], where these elements have the same meanings as for uu (cometary appearance rate/yr and period length in months). The default for pp = [.08 2200], which extends the observation period 2200 months (= 183.33 years) with the same appearance rate as the latter part of the observation period. This is large enough to contain any prediction from any observation period and hh described in the Letter. (Explanation: the largest possible interval in a sequence of hh observations in a 362 year perid is 362/(hh-1); therefore the latest possible prediction based on hh=3 observations must be 362/2 = 181 years.) For accommodations, pp represents a fixed extension to the observation period that must be set to a value appropriate to the particular case.
logfile: the name of a file to which results will be recorded. If no log file is provided, the only output of halleyrnd is the output variable pval. If logfile is provided, this output is recorded in the file along with a trace of all parameters and some statistical information. If the file name starts with a '>', it will be appended with information from the current run; otherwise, the old information will be erased.

halleyrnd.m defaults are set to do predictions based on Halley's actual observation period. The following syntaxes are examples that correspond to simulations described in the Letter.

halleyrnd('p',10000,[],[],[],[],'>hlog.logfile') -- P value for prediction based on Halley's actual 362 year observation period.
halleyrnd('a',10000,[],4,[],[.08 935],'>hlog.logfile') -- P value of accommodation where the actual observation period is extended by 935 months, where 935 = approximate average extension beyond the observation period for randomly confirmed predictions in a series of prediction runs of the prior form.
halleyrnd('a',10000,[],4,[],[.08 120],'>hlog.logfile') -- P value of accommodation where the actual observation period is extended by 10 years.
halleyrnd('a',10000,[],4,[],[.08 720],'>hlog.logfile') -- P value of accommodation where the actual observation period is extended by 60 years (to the actual 1758 return date of Halley's comet).
halleyrnd('p',100,[],4,[],[],'>hlog.logfile') -- P value of prediction based on Halley's 362 year observation accommodation that would be applicable if he had found a sequence of 4 appearances in this period and had predicted a 5th beyond the period. (30 runs with nn=100 were aggregated to generate the P value for nn=1000 reported in the Letter.)
halleyrnd('a',10000,[],5,[],[.08 720],'>hlog.logfile') -- P value of accommodation corresponding to the prediction above that assumed that Halley had observed 4 appearances in sequence and predicted a 5th. Here the corresponding accommodation entails seeing 5 appearances in sequence within the actual observation period extended by 60 years (to the actual 1758 return date of Halley's comet).
halleyrnd('a',10000,[.06 816 ; .08 1008],4,[],[.08 912],'>hlog.logfile') -- P value of accommodation where the observation period is assumed to be the years between 1531 and 1682, the interval between the first (1531) and third (1682) actual appearances of the comet that corresponded to the 3 / 24 observations Halley used for his prediction, and the extension to this period is assumed to be 76 years (the 1758 predicted appearance of the comet).
halleyrnd('p',10000,[.06 816 ; .08 1008],3,[],[],'>hlog.logfile') -- P value of a prediction using the observation period above (period between the first and third siting of the comet).

References

Lipton, P (2005) "Testing hypotheses: Prediction and prejudice" Science 307: 219-221 (PubMed)
Irving Langmuir (1953) "Pathological science," as edited by Robert N. Hall (Oct. 1989) Physics Today: 36-48
Irving M. Klotz (1980) "The N-ray affair" Scientific American 242: 168-175
Malcolm Ashmore (1993) "The theater of the blind: Starring a Promethian prankster, a phony phenomenon, a prism, a pocket, and a piece of wood" Social Studies of Science 23: 67-106.
US DOE Energy Research Advisory Board (1989) DOE/S-0073 DE90 005611 (here)
US DOE Office of Science (2004) "Report of the review of low energy nuclear reactions" (here)
Peter Lancaster-Brown (1985) "Halley and his comet" (Blandford Press, Dorset 1985). See Table 1, p.179 for perihelion dates and orbital characteristics of the 24 comets considered by Halley.

Contact us

Please contact John Aach with any questions, issues, or concerns.