Problems with the NCDC Global Temperature Data

Update Nov 24th:  It appears that they have updated their site and the errors have disappeared.  The initial report is that they had the data mis-aligned.  Since I have the original and now the updated I will be reviewing this.

I have been looking at tweaking the blended temperature set that I have been using.  My intent has been to replace the Hadley data with the NCDC data (Global, Land & Sea).  While I was looking through the data I found some bizarre discrepancies.  The problem might be due to the presentation, but so far I have not found another source to get their data, so I cannot verify this.

The data from their site is tedious to extract as it is available by month from 1880-current.  After I built my tables of data I did a comparison between the Beta-V3 and the V2 data.  The primary difference I have found is that the difference between the two sets is not being calculated correctly.  This problem is most evident in April, September and December.  Sometimes enormous differences between the versions exist, but the differences are not being seen in the stated difference.

For example, April of 1996 has V2: 0.71 °C and Beta-V3: 0.24 °C.  The calculated difference is 0.02 °C instead of the correct 0.47 °C difference.  April and December are full of such errors.  Here is a chart of the errors for these two months.

The Inconvenient Skeptic

Error is the Version Difference from the NCDC Website

Needless to say it is difficult to trust the value of data that cannot even correctly state the difference between the two sets they are using.  Having a significant difference between versions is not bad by itself, but having a large difference and then understating the error is a problem.  That is exactly what has happened here.

This could be a problem with the website data, but as of now it appears that the error is real and not just related to the website.  This does lead to the question of which set is correct and what other errors exist in the data provided by the NCDC.  This also supports my approach of minimizing the errors in the different sets by merging the sets together for a composite data set.

If the anomaly data is annualized and then shown it appears that V3 has a slightly higher anomaly than V2.  It is a rare year that V2 has a higher anomaly than V3, especially since 1940.  Grossly understating the error and introducing a set with higher anomaly seems like a problem that deserves a deeper understanding.  It is this type of error that causes people to mis-trust the data.

The Inconvenient Skeptic

(Green) NCDC V2 Anomaly, (Blue) NCDC V3-Beta Anomaly

Posted in Anomaly by inconvenientskeptic on November 23rd, 2010 at 8:59 am.

1 comment

This post has one comment

