Computation of P values for Halley's prediction of the comet of 1758

Supplementary Material
for
Letter (Accommodation or Prediction?)
Science 308: 1410
John Aach and George M. Church
Department of Genetics, Harvard Medical School

Last modified: June 8, 2005

 

Contents

 

Background

When a scientific hypothesis is confirmed, the data confirming it may either have been gathered before or after the hypothesis was formulated. The latter is called a prediction and the former an accommodation of the data. Peter Lipton (1) has recently argued that predictions carry more evidential weight than accommodations because there is more opportunity for investigators to 'fudge' the data in accommodations than predictions. We believe this emphasis on fudging is misplaced on two counts. First, if accommodation offers increased opportunity, there remains considerable latitude for fudging the confirmation of predictions when relevant observations are subject to choice and interpretation. Erroneously confirmed predictions figure prominently in problematic science (2) such as N-rays (3,4) and cold fusion (5,6). Second, the emphasis on fudging overlooks the powerful ability of statistical analysis to evaluate the degree of confirmation of both predictions and accommodations. In our Letter to Science, we explore the latter by estimating P values for predictions and accommodations associated with a famous historical prediction mentioned by Lipton: Edmond Halley's hypothesis, published in 1705, that cometary appearances in 1531, 1607, and 1682 were all from the same comet, and that the comet would return in 1758. Halley's prediction was subsequently confirmed. Here we describe how we estimated P values for Halley's prediction and for corresponding accommodations.

We note that our calculation of P values is intended only to illustrate how statistics may be used to evaluate the relative degrees of support that may be accorded to predictions vs. accommodations in the context of a historically famous prediction, but is not intended to be a complete statistical analysis of Halley's actual prediction taking into account all astronomical details. Specifically, only the near periodicity of cometary arrivals is considered in P value calculations and not orbital characteristics or the relationship between deviations from pure periodicity and perturbations due to passage near planets. (See below for a discussion of the impact of these assumptions, however.)

 

Methods

Halley considered 24 comets that had appeared between 1337 and 1698 and found three separated by 75 or 76 year intervals with similar orbital characteristics (7). His prediction of a comet of 1758 was based on the assumption that the orbit was elliptical and that the comet should therefore reappear ~76 years from its last visit in 1682 (see notes). We computed a P value for this and similar predictions relative to the null hypothesis that comets appeared at random with historic rates using the MatLab m-file "halleyrnd.m" (download here). To compute a P value for Halley's prediction,

  1. Using Poisson statistics, halleyrnd.m first randomly generates the number of comets that would be expected to appear in a 362 year period from historic data on comet observation rates. halleyrnd.m then randomly generates that number of cometary appearances over the 362 period using uniform distributions. Using simple assumptions, we attempted to model Halley's presumed selection bias for which observed comets would be part of his sample into the cometary observation rates (see notes below).

  2. halleyrnd.m then randomly generates cometary appearances in the period subsequent to the 362 year observaton period using the same cometary observation rate information.

  3. halleyrnd.m then searches the appearances within the 362 year observation period and looks for all sets of 3 or more comets that appear to be in sequence, and for which the next member of the sequence would be beyond the 362 observation period. If no such sequence is found, the current simulation is rejected and another is generated starting at step 1. In determining whether appearances are in a sequence, individual cometary appearances are allowed to deviate from precise sequence positions by a specified error tolerance (see notes below).

  4. For each simulation of cometary appearances (1 and 2 above) for which one or more sequences (3 above) are detected, halleyrnd.m then checks to see if any of the predictions for the post-observation period are within the error tolerance of a randomly generated appearance in this period. If any prediction meets this criteria, a 'confirmed' prediction is counted.

  5. halleyrand.m iterates until a specified number of simulations that generate predictions (3 above) is reached, and reports the ratio of confirmed predictions to generated predictions as the P value of prediction.

halleyrnd.m is also used to generate P values for accommodative confirmations of a hypothesis of cometary return. Halley's confirmed prediction involves finding 4 cometary appearances in sequence, the first 3 within the initial observation period and the 4th subsequent observation. A corresponding accommodative confirmation also involves finding 4 cometary appearances in sequence in a specified observation period. To compute a P value for such an accommodation, halleyrnd.m goes through steps 1 and 2 above to simulate cometary appearances, and searches the appearance set for a sequence as in step 3 above. However, in this case, halleyrnd.m simply counts any simulation with a sequence of 4 appearances as successful, performs a specified number of iterations, and reports the ratio of successful to total iterations as the P value for accommodation. There is no need to reject iterations when computing P values for accommodation compared to prediction (see step 3 above). This is because the probabilities computed for accommodation are of the form P(A,B), and those for prediction of the form P(A|B), where B = first 3 observations in sequence and A = 4th observation in the sequence of B. To estimate probabilities for prediction, we must focus on only those simulations which meet the condition B, and count the number of those simulations that also meet A. To compute P(A|B) using nn trials therefore involves open-ended iterations of comet simulation that continue until nn trials meeting condition B have been found. For accommodation, however, no such constraints apply to comet simulations, and a probability estimate of P(A,B) using nn trials needs only nn simulations.

However, a subtle difference between predictions and accommodations must be taken into account in simulating 'corresponding' accommodations. Predictions start from a fixed observation period and, based on them, generate future predictions that are beyond them, but accommodations consider only events within a fixed period and never look beyond that period. What observation period should be assigned to an accommodation to 'correspond' to Halley's prediction? If we simply look for accommodations within Halley's initial 362 year observation period, accommodations will have smaller sets of cometary appearances in which to find a sequence of 4+ appearance than do predictions which look beyond the 362 years. On the other hand, if we extend the observation period beyond 362 years for accommodations, they will tend to have more cometary appearances to consider than do predictions, as the latter only look at a single appearance beyond the 362 year period. We dealt with this issue simply by considering different sized extensions to the 362 period when computing accommodation P values: (i) the period beyond the 362 years that corresponds to the actual reappearance of Halley's comet in 1758 (60 years beyond 1698), (ii) the average extension beyond the 362 that corresponds to randomly confirmed predictions of cometary return (found to be ~77.9 years), (iii) an arbitrary short period (10 years).

P values reported in our Letter were based on at least 3 runs of 10000 trials each (i.e. 10000 simulations for accommodations, and 10000 simulations passing the constraint in step 3 for predictions), with one exception: The P value for a prediction of a 5th appearance based on 4 observations was performed with only 3000 trials because of the very high rejection rate in step 3 (to find 3000 simulations with an appropriate sequence of 4 required > 282600 iterations). Since halleyrnd.m is based on random number generation, different simulations will result in different P value estimates. We found that multiple halleyrnd.m runs with nn=10000 performed for the same predictions and accommodations generally produced P values within .01 of each other. P values presented in the Letter were based on aggregates of all runs done for the given parameters rounded to two decimal places. In addition to the simulations described above based on Halley's 362 year observation period, we also generated P value estimates on the assumption of an observation period limited to the 152 year period corresponding to 1531-1682, the first and third observations of the returning comet considered by Halley. These simulations were performed the same way as those described above except for the different periods involved (see notes for details). Simulation data from the individual runs used to generate P values reported in the Letter or on this web site may be found here.

 

DETAILED NOTES

 

Additional results and discussion

A complete list of the results of all of the simulations used in our Letter and on this web site may be found here. Here we provide some general observations that amplify on results cited in the Letter and in this list of simulation results.

  1. The P value of a prediction is computed from the likelihood, by chance, that a random sequence of k observations found in the observation period, when extended to element k+1 outside the period, is found in randomly simulated comet appearance data, within sequence tolerances. Since the occurrence of the predicted appearance in the random post-observation data is independent of the random observation data used to predict it, and the randomly generated data follows a uniform distribution, the probability of a confirmed prediction should be independent of k. However, the probability of a 'corresponding' accommodation based on a sequence of k+1 appearances (see above) should decrease with both k and the length of the observation period because (i) longer sequences are always extensions of shorter sequences, so that the likelihood of finding a sequence of k+1 is necessarily smaller than that of finding a sequence of k, and because (ii) given fixed cometary appearance rates, shorter observation periods will tend to contain fewer appearances and therefore less opportunity to form sequences of fixed length k+1. Here we consider how this prediction of the behavior of P values compares with our simulations. In particular, we observe:

    1. The P value of accommodations based on observation periods of 362 + x years and 4 observations was found to decrease with x. For x = 935 months = 77.9 years (the average extension of randomly confirmed predictions beyond a 362 year observation period for Halley's prediction), the P value is ~0.12. For x = 60 years (the extension beyond the 362 period corresponding to Halley's actually predicted return of the comet in 1758), the P value is ~0.11. For x = 10 years (an arbitrary short extension period), the P value is ~0.09.

    2. We also considered predictions and accommodations based on an observation period defined by the actual three appearances of Halley's comet in his list of 24 on which he based his prediction of 1758. From the argument above, we should expect that the P value of predictions made based on this 152 year subset of Halley's full 362 year observation period should be the same as that for the full 362 period, while P values for accommodations based on 152 + x year periods should be smaller than those based on 362 + x year periods containing them. In fact we find the P value for prediction based on this 152 year observation period drops slightly from ~0.03 to ~0.02, while the P value for an accommodation based on a 152 + 76 year period (a subset of the 362 + 60 year period described in a above) drops to ~0.08 from ~0.11.

    3. Finally, using Halley's full 362 year observation period, we considered predictions of a 5th appearance based on 4 observations in sequence, and accommodations over 362 + 60 years based on 5 observations. Again, the P value of the prediction dropped to ~0.02 from the ~0.03 P value for the prediction based on 362 years in a above, while the P value for accommodation for 362 + 60 years dropped from the ~.011 value in a above to ~0.08.

    From these observations, it appears that our expectations for P values for accommodation are confirmed, but P values for prediction may have a tendency to drop when k is increased or the observation period is decreased, while our initial expectation is that they should be constant. In fact, a slight drop in P values for prediction under these circumstances might be expected for the following reason: In computing P values for prediction via random simulation, all sequences of k are examined which extend beyond the observation period, and a successful prediction is scored whenever any one of them is found among the random appearances for the post-observation period (within error tolerances). Thus, as longer observation periods will contain more appearances and more opportunities for appropriate sequences of k terms, they present an increased probability of multiple predictions and thereby an increased P value of a confirmed prediction. Likewise, increasing k should make it harder to find multiple k-sequences, thereby decreasing the random likelihood of prediction. Thus, both decreasing the observation period, or increasing k, should lead to a drop in the P value of prediction.

    This consideration being noted, over the course of our simulations we did not attempt to evaluate the degree to which this phenomenon might have accounted for our observation of lower P values for prediction associated with longer sequences or shorter observation periods. We did note the occasional occurrence of multiple predictions in debugging, and our impression is that they were rare events. We also note that the limited number of trials (3000) we performed for prediction of a 5th appearances from sequences of 4 (for reasons noted above) makes this observation less secure than those of P values based on larger numbers of trials.

  2. Effect of taking orbital characteristics and gravitational theory into account: While, as noted, our P value computations do not take orbital charactistics and gravitational perturbations into account, general considerations suggest that doing so should not greatly change the relative strengths attributed to prediction vs. accommodation by statistical analysis, so long as the null hypothesis is that orbital characteristics are independent of the time of appearance. We consider two aspects:

  3. In 2. above we observed that correct estimation of P values depends on the need for a complete and correct specification of the hypothesis under study in relation to the assumed null distribution. Here we note that this relates to Peter Lipton's claims about fudging, in that where rules are incompletely or incorrectly specified, it is possible to be unaware of or ignore multiple hypotheses or adjustable parameters, and this effect can lead to incorrect estimates of the degree of validity of confirmations. For instance, if several hypotheses were considered in the course of developing an accommodative hypothesis, but only one that is confirmed is acknowledged and reported, the P value for this accommodative confirmation should be judged in the context of the original full set of multiple hypotheses, and since these are unacknowledged and unreported the degree of validity of the confirmation will be overestimated. The greater opportunity present in accommodations compared to predictions, for considering and rejecting hypothesis that are unconfirmed before finding and reporting one that is confirmed, may largely account for the lesser validity that Peter Lipton attributes to accommodations.

    However, predictions are not immune from biased assessments of validity due to incorrect and incomplete specifications of parameters and hypotheses. For instance, in assessing the P value of Halley's prediction, it would be easy, but incorrect, to focus on the 152 years between 1531 and 1682, the dates of the first and third actual observations of the comet, as defining the observation period from which the P value of his prediction should be judged. However, this would be an incorrect null hypothesis from which to judge the value of his confirmed prediction, because Halley searched a larger 362 year observation period for comet appearances that followed an approximate sequence, and, as we noted above, there are reasons why the P value for prediction may drop when based on shorter observation periods, making the value of the confirmation seem larger than it should be. Although in this case we observed only a small effect from this incorrect null, changes in the way the problem is specified could have larger effects. E.g., if we modified the null hypothesis that cometary appearances follow a uniform distribution in the post-observation period, and assumed, rather, that the probability of observing a comet at a point in time steadily increases as the post-observation period progresses (a different, but reasonable, way of modeling the historic trend towards higher rates of comet observations in Halley's time than assumed above), the assumption of 152 year observation period would further depress P value estimates compared to those estimated more correctly from the 362 year period. The general rule is that care must be taken to completely and correctly specify hypotheses and null assumptions for predictions as well as accommodations, and that to the extent that this is not done, predictions, like accommodations, can be subject to fudging.

    We note, finally, that this consideration underscores the limitations of distinguishing prediction and accommodation in terms of the time relation of an observer to confirming observations -- i.e., that for the former one or more confirming observations are in the observer's future while for the latter all are in the past. Lipton makes reference to this mode of distinguishing prediction and accommodation in the introduction to (1), where he summarizes arguments that purport to demonstrate why prediction and accommodation should be equivalent based on the irrelevancy of whether observations are in an observer's future or past. However, statistically, the entire range and time interval of attempted observation is relevant to judging the degree of validity of the confirmations, not merely the confirming observations, the particular observations used to formulate the hypotheses, and their times. This provides an additional argument for rejecting the distinction between prediction and accommodation based on observation times that is not mentioned by Lipton in his criticism of this conception.

 

Download halleyrnd

You may download the halleyrnd.m MatLab m-file here

 

halleyrnd usage notes

The syntax for running halleyrnd.m in MatLab is

pval=halleyrnd(mode_str,nn,uu,hh,ee,pp,logfile)

'pval' is the P value returned by the simulation. Parameters are as follows:

  1. mode_str is 'predict' or 'accommodate' or any prefix ('p' or 'a' will do). Default = 'p'.

  2. nn is the number of trials that will be performed (see above). As noted, for prediction, nn trials will typically require >> iterations of simulation of cometary appearances, but for accommodation, nn trials will entail exactly nn iterations. Default = 1000.

  3. uu defines the period of observation for predictions and associated cometary appearance rates. For accommodations, the observation period defined by uu is extended as specified by the pp parameter (below). uu has the syntax [rate1 period1 ; rate2 period 2], where rate1 and rate2 are rates of cometary appearances / year and period1 and period2 are the lengths of the corresponding periods in months. Default = [ .06 3144 ; .08 2200], which corresponds to the estimates of rates applicable to Halley's actual 362 year observation period (see notes).

  4. hh is the sequence 'hit count' for prediction or accommodation within the observation period. For prediction, hh defines the number of appearances in sequence that must be found in the observation period; for an iteration to be accepted, the next element of the sequence of hh must additionally be beyond the observation period. For accommodation, all iterations are accepted, and any that contain a sequence of hh appearances is considered a successful trial. Default = 3, which corresponds to the hit count for the observation period associated with Halley's prediction. To run a corresponding accommodation, hh should be set to 4.

  5. ee is the error tolerance for deviation of appearance times from an exact sequence. Default= 1 (see notes).

  6. pp defines the post-observation period. pp has the form [rate period], where these elements have the same meanings as for uu (cometary appearance rate/yr and period length in months). The default for pp = [.08 2200], which extends the observation period 2200 months (= 183.33 years) with the same appearance rate as the latter part of the observation period. This is large enough to contain any prediction from any observation period and hh described in the Letter. (Explanation: the largest possible interval in a sequence of hh observations in a 362 year perid is 362/(hh-1); therefore the latest possible prediction based on hh=3 observations must be 362/2 = 181 years.) For accommodations, pp represents a fixed extension to the observation period that must be set to a value appropriate to the particular case.

  7. logfile: the name of a file to which results will be recorded. If no log file is provided, the only output of halleyrnd is the output variable pval. If logfile is provided, this output is recorded in the file along with a trace of all parameters and some statistical information. If the file name starts with a '>', it will be appended with information from the current run; otherwise, the old information will be erased.

halleyrnd.m defaults are set to do predictions based on Halley's actual observation period. The following syntaxes are examples that correspond to simulations described in the Letter.

 

References

  1. Lipton, P (2005) "Testing hypotheses: Prediction and prejudice" Science 307: 219-221 (PubMed)

  2. Irving Langmuir (1953) "Pathological science," as edited by Robert N. Hall (Oct. 1989) Physics Today: 36-48

  3. Irving M. Klotz (1980) "The N-ray affair" Scientific American 242: 168-175

  4. Malcolm Ashmore (1993) "The theater of the blind: Starring a Promethian prankster, a phony phenomenon, a prism, a pocket, and a piece of wood" Social Studies of Science 23: 67-106.

  5. US DOE Energy Research Advisory Board (1989) DOE/S-0073 DE90 005611 (here)

  6. US DOE Office of Science (2004) "Report of the review of low energy nuclear reactions" (here)

  7. Peter Lancaster-Brown (1985) "Halley and his comet" (Blandford Press, Dorset 1985). See Table 1, p.179 for perihelion dates and orbital characteristics of the 24 comets considered by Halley.

Contact us

Please contact John Aach with any questions, issues, or concerns.

 

Copyright

Copyright (c) 2005 by John Aach and the President and Fellows of Harvard University