Forecast Central

Basic Contest Information

  • Introduction

    Forecast Central is an online weather forecasting contest in the USA, intended as a fun and interactive teaching tool for earth science and other meteorology-related classes at all levels. But it's also available for weather geeks who just like to predict and obsess about the weather.

    During most of the school year, a national "forecast-of-the-day" is issued for each weekday. The assignment is to forecast for a single specified site, which can be any of more than 800 National Weather Service (NWS) observing sites (mostly airports) in the 50 U.S. states. For example, Raleigh-Durham, North Carolina is a site, the ICAO ID is KRDU. We only forecast for sites that have a good record of reliable hourly observations, and that also have NWS automated Model Output Statistics (MOS) forecasts. A single forecast variable is also specified, which could be 24-hr minimum temperature (Tmin), 24-hr maximum temperature (Tmax) or 24-hr total accumulated liquid (including melted snow/ice) precipitation amount (precip24). Later this year the snow depth at the end of the day will be added to the contest. So after a forecast-of-the-day is posted in the morning, all forecasters have most of the rest of the day (until 0000 UTC/7 pm EST, 4 pm PST, etc.) to submit a numerical forecast in degrees Fahrenheit or inches of precipitation (because that's how weather information is presented in the U.S.), along with an optional comment.

    For the contest, a day is defined as going from 0600 UTC on one day to 0600 UTC on the next day. That's approximately midnight, depending on the time zone and the status of daylight saving time. The reason for that choice is that the NWS reports 6-hr Tmin, Tmax and precipitation amounts at 0000, 0600, 1200 and 1800 UTC, and 0600 UTC is closest to midnight for most of the country.

    The daily forecast site and variable is chosen to be either an interesting or difficult forecast situation, or because it could be used to illustrate meteorological concepts (since school classes use the contest as a teaching tool). For example, a site may be chosen in an area where there is expected to be dramatic temperature changes, or the possibility of heavy precipitation. The situation will sometimes be more subtle, with conditions that are just unusual or that illustrate a meteorological concept. For example, a day could demonstrate the effect of cloud cover or snow cover on minimum or maximum temperatures.

  • Common pitfalls

    It's common for beginning forecasters to make a mistake on precipitation forecasts in a couple of ways. First, it's important to remember that a precipitation forecast is in decimal inches, so a forecast of "1" means 1 whole inch, which is quite a lot of rain or snow. For most U.S. locations, there will be only a few days per year when they get that much precipitation or more in a single day. Usually forecasts are in hundredths of an inch, so include the decimal point -- "0.25" means 25 one hundredths of an inch, or one quarter of an inch. Second, it's often good to be a little conservative with precip forecasts. It's common that a site will get no precip at all, or a few hundredths of an inch, even on days when much heavier precipitation was predicted.

    Another pitfall is forgetting to consider the entire day for temperature forecasts. Usually the daily minimum temperature is at about sunrise and the daily maximum is during the afternoon. But with strong fronts or cloud cover and precip complications, the Tmin or Tmax could be at the very beginning or ending of the day.

  • Observations and reference forecasts

    For each day, the NWS observations for the site are reported on the "OBSERVED" line. The observations are downloaded and processed every hour, although sometimes missing data causes problems. The 6-hr precipitation amounts are used to calculate the daily total, so there can sometimes be a long delay before new precipitation shows up in the OBSERVED line.

    The observations for the same site and variable are reported as the "PERSISTENCE" forecast. That is often used in meteorology as a "reference forecast", which can be a simple standard, against which other forecasts can be compared. Persistence forecasts are usually quite poor, for the obvious reason that the weather in the mid-latitudes usually changes significantly from one day to the next. In the summer, or in the tropics, persistence could be a reasonable forecast.

    Another type of reference forecast is "CLIMATE", which are the climatological averages for each variable on that particular day, obtained from NWS datasets. Again, the climate forecasts are usually not very good, but the average temperatures can be useful. The average 24-hr precipitation amount is usually not very useful, because most sites have many days of no precipitation mixed with fewer wet days, making the average very unlike a "typical day".

  • MOS forecasts

    Model Output Statistics (MOS) forecasts are issued every six hours, for the two major NWS atmospheric models that cover the entire country, including Alaska and Hawaii, the Global Forecast System (GFS) and the North American Mesoscale (NAM) model. Neither model is superior for all sites and weather situations, but professional forecasters often have their own preferences. The models use different numerical methods, physical packages and parameterization schemes, so the forecasts can be quite different, but in some situations they can be very similar. It's common for professional forecasters to look at MOS forecasts as part of their forecast preparation, but it's a mistake (and violates the spirt of a forecasting contest) to just adopt MOS forecasts as your own. MOS forecasts have a lot of skill, but sometimes also have characteristic weaknesses in certain situations. We will sometimes choose sites and variables where the GFS and NAM forecasts do not agree, as that can occur in difficult or unusual weather scenarios.

    In our contest, the MOS forecasts provide a context for our own human forecasts. The best forecasters should be able to beat MOS over time, but beginning forecasters may have difficulty with that.

  • Scoring

    At the end of the day, all of the forecasts are compared to the observed value to get the error of each forecast. Each forecast is assigned a score which is based on the standard score or z-score, which "normalizes" the scores in terms of how many standard deviations each forecast error differs from the mean error of all forecasters. For the z-score, a value of zero normally means that the forecast error is equal to the average error, but we add 2 to the z-score to shift the distribution toward positive numbers. That means that a forecaster whose error is equal to the average of all forecasters' errors get's a score of 2. A score of 3 means that the error was one standard deviation smaller than the average error, which would be a good forecast. A score of 1 means that the error was one standard deviation higher than the average error, which would be a poor forecast. Any score less than zero is set to zero, which would be a very poor forecast.

    It's important to understand that the relationship of the scores to the actual values of the forecast error will differ greatly on different days. For example, if a Tmax forecast is relatively easy, and all of the MOS and human forecasts are within a few degrees, then an error of 5 degrees could be very poor. But on a more complicated day when Tmax forecasts vary widely, an error of 5 degrees could be very good.

    A forecaster gains awards (symbols by their name) by having the highest score for a day ( ), or by having the highest combined daily scores over an entire week ( ), month ( ) or year. So the week, month and year awards reward forecasters who combine forecast quality (high daily scores) with high participation (frequent forecasts).

    The forecaster rating is the median day score, which means the score in the center of the distribution of all of that forecasters daily forecasts. The median is a more robust measure than the mean (average), so one or two terrible or great forecasts may not have much of an effect on the rating. Instead it should better reflect the quality of the total set of forecasts.