Welcome to the Unofficial
PeakfqSA/EMA-Peak Home Page


hosted by Tim Cohn
6 October 2011

Frequently Asked Questions:

Q: Is this low-outlier method different from the Grubbs and Beck test in Bulletin 17B?

Answer: Bulletin 17B provides a fairly rigid structure for identifying outliers based on the Grubbs-Beck test (alpha=10%). In some cases, engineering judgment is permitted:
If multiple values that have not been identified as outliers by the recommended procedure are very close to the threshold value, it may be desirable to test the sensitivity of the results to treating these values as outliers.
PeakfqSA/EMA-Peak employs an iterative scheme for identifying low outliers. The intent is that no "engineering judgment" will be required. The algorithm involves iterating 5 steps
  1. The LP3 distribution is fit to all the data, using both systematic and historical information. Regional skew information is employed.
  2. As in B17B, the Grubbs-Beck test is applied to identify a critical value.
  3. The smallest observation exceeding the critical value becomes a new low-outlier threshold. The low-outlier threshold applies to all of the data.
  4. Observations smaller than the low-outlier threshold are classified as low outliers. The magnitude of each such observations is recoded as interval (Q_{Y,lower},Q_{Y,upper}) = (0,T_{L}, where T_L is the magnitude of the low-outlier threshold.
  5. The observational thresholds for all the data are adjusted to reflect T_L as a lower bound on what would have been observed.
The sequence is repeated until no additional low outliers are identified.

Q: What about high outliers?

Answer: EMA assumes that all historical information is already being employed in the frequency analysis, and, like Bulletin 17B, under no circumstances would a high outlier be removed from a dataset. Thus, with EMA there is no reason to recognize "high outliers" or to treat unusually high values any differently than one would treat other values.

EMA-Peak does report the percentage by which each value in the calibration data set deviates from the fitted model. This provides an indication of whether any values in the calibration dataset are unusual.

Q: How do I get PeakfqSA/EMA-Peak to run under Mac OS X?

Answer:
  1. Download the zip file
  2. Double click on it to create a new subdirectory
  3. Open the Terminal application
  4. Use the "cd" command to make the current directory the one containing PeakfqSA.out
  5. enter the command PeakfqSA.out BigSandy_hist.spc
The program should run in a few seconds, and create a bunch of output files that you can then inspect.

Q: What does EMA do about adjusting the thresholds to correspond to the low-outlier test?

Answer: The final Grubbs-Beck critical value is used to identify low outliers. Because values below this level will be treated as censored, the lower bound on the observation thresholds for all observations are set equal to this value.

Q: How do I specify a higher censoring threshold for all my data?

Answer: There are several ways to do this depending on the situation. First, where one's gage has a lower bound of detection of, say, 100 [cfs], you would set the gage base discharge to 100 [cfs]. Second, if you wanted to specify a low-outlier threshold, you could specify the low-outlier value at 100 [cfs]. However, note that in this case the value of the corresponding lower threshold would be the magnitude of the smallest observation in the dataset that exceeded the low-outlier threshold. Finally, you could simply specify the lower bounds on the threshold as 100 [cfs].

Q: What is the meaning of the historical years that have a flow range of 0 to 18,000 [cfs], and a perception threshold of 18,000 [cfs] to infinity?

Answer: The "perception threshold" in EMA is the discharge that would have been recorded in a given year if a flood of that magnitude or greater had occurred. For years when a streamgage was operating (i.e. systematic gaging; since 1930 on the Big Sandy River) the "perception threshold" is the "gage base discharge" which is usually 0 [cfs] -- we record everything. Between about 1890 and 1930 on the Big Sandy we have records of big floods (e.g. 1897). These were recorded because they were big; we don't know much about the magnitudes of the smaller floods except that they were small. Thus we have a "perception threshold," which turns out to be about 18000 [cfs]. To confuse things a bit, EMA also allows you also to specify an upper bound on what would have been observed. This addresses two situations that arise in practice: First, a REALLY big flood will destroy the streamgage; second, in some cases (for example, where there are multiple physical processes associated with the flood hazard), we will know when a big event occurred but can only specify a lower bound on the equivalent flood magnitude. To be specific: For the Big Sandy River, we have gage record from 1930 to 1973. The corresponding "perception thresholds" (aka "observational thresholds") for every year between 1930 and 1973 are represented in EMA by (Tl,Tu) = (0,Infinity). From about 1890 to 1929 we saw three big floods, the smallest of which was 18500 [cfs]. So, roughly, we can infer that floods above 18000 [cfs] would have been recorded. So the "perception thresholds" corresponding to years in this part of the record are (Tl,Tu) = (18000,Infinity). For extra credit: What are the "perception thresholds" for 1974-2007? (In fact, we have gage record for this period, so the correct answer is (Tl,Tu) = (0,Infinity)). However, if we ignore the gage record, it seems certain that, had a big flood occurred, we would have a record of it. Thus (Tl,Tu) = (18000,Infinity) might, again, be reasonable.

Q: Are the "small" floods (those that did not exceed the "perception threshold" of 18000 [cfs]) considered to be exactly 18,000 [cfs]?

Answer: No. They are treated as type I censored values, with unknown magnitudes between 0 and 18000 [cfs]. EMA handles this sort of categorical data. Two important facts about censoring the "small" values is that doing so does not (substantially) bias estimates of extreme flood quantiles and it does not substantially increase the variability of estimates of extreme floods. Almost all of the information about the frequency of extreme floods is contained in the right-hand tail of the sample.

Q: Version 0.970 of PeakfqSA includes options for incorporating regional information about the mean and standard deviation into the analysis -- when would this be useful?

Answer: In some rare cases, mostly involving desert sites in the West, it is hard to find records with even a dozen non-zero annual peak flows. In such cases, at-site estimators for all three parameters, {M, S, G}, tend to be relatively unstable. It may be desirable to employ regional information for all three parameters; Bulletin 17B recommends regional information be employed only for estimating the skew {G}.

Q: The default PeakfqSA SDOPT is STATION - is STATION the correct SDOPT to use for WEIGHTED SKEWOPT analyses?

Answer: For Iowa the answer is yes; you will be using regional information for the skew (G), but not for the mean (M) or the standard deviation (S). SDOPT and MEANOPT are two features needed only rarely, for example in the Mojave Desert of California.

Q: The default PeakfqSA A_S_Skew_OPT is ADJE - is ADJE the correct PeakfqSA At-Site Skew Method to use for WEIGHTED SKEWOPT analyses?

Answer: Yes, ADJE means that the estimated MSE will coincide with Bulletin 17B at sites with only systematic data. At sites with low outliers, historical information, or other non-standard data, the ADJE will correctly represent the MSE (B17B does not correctly compute the MSE in such cases).

Q: The IA peak-flow study includes many CSGs with censored data, should the PeakfqSA SDOPT or PeakfqSA A_S_Skew_OPT options be set any different for censored data?

Answer: No. The default options are almost certainly what you want to use. In IA, where relatively long datasets are available, you should use "weighted" skew and "station" mean and standard deviation. Site last updated 06 Oct 2011

Comments or suggestions? Email tim@timcohn.com