Stationary Quantile Matching

This example shows the functionalities of QuantileMatching.jl. They are illustrated by post-processing the simulated precipitations of the kda member from the ClimEx ensemble (Leduc et al., 2019) with respect to the observed precipitations recorded at the Montréal-Trudeau International Airport. To avoid seasonality, we only consider precipitation from May to October and to avoid non-stationarity, we restrict the period from 1955 to 2010.

The precipitation quantile matching procedure is performed in two steps. The first step is to match the empirical proportion of wet days between observations and simulations. The second step is to match the quantiles of simulated non-zero precipitation.

Load required Julia packages

Before executing this tutorial, make sure to have installed the following packages:

  • CSV.jl (for loading the data)
  • DataFrames.jl (for using the DataFrame type)
  • Distributions.jl (for using distribution objects)
  • ExtendedExtremes.jl (for modeling the bulk and the tail of a distribution)
  • Extremes.jl (for modeling the tail of a distribution)
  • Plots.jl (for plotting)
  • QuantileMatching.jl

and import them using the following command:

julia> using CSV, DataFrames, Dates, Distributions
julia> using ExtendedExtremes, Extremes, QuantileMatching
julia> using Plots

Data preparation

Loading the observed daily precipitations (in mm)

# Load the data into a DataFrame
obs = QuantileMatching.dataset("observations")
dropmissing!(obs)

# Extract the values in a vector (including the zeros)
y = obs.Precipitation

Plot the observed series for the year 1955:

df = filter(row -> year(row.Date)==1955, obs)

Plots.plot(df.Date, df.Precipitation, label="", color=:grey,
    xlabel="Date", ylabel="Observed precipitation (mm)")
Example block output

Loading the simulated daily precipitations (in mm)

# Load the data into a DataFrame
sim = QuantileMatching.dataset("simulations")
dropmissing!(sim)

# Extract the values in a vector (including the zeros)
x = sim.Precipitation

Plot the observed series for the year 1955:

df = filter(row -> year(row.Date)==1955, sim)

Plots.plot(df.Date, df.Precipitation, label="", color=:grey,
    xlabel="Date", ylabel="Simulated precipitation (mm)")
Example block output

Frequency adjustment

Compute the proportion of wet days in the observations:

julia> p = pwet(y)0.40563241106719367

Find the threshold for which the proportion of wet days in the simulations matches the proportion of wet days in the observations:

julia> u = wet_threshold(x, p)0.34030526876449585

Censor the value below the threshold:

x̃  = censor(x, u)

Extraction of non-zero precipitations

Extract the observed non-zero precipitations:

y⁺ = filter(val -> val >0, y)

Extract the frequency adjusted non-zero simulated precipitations:

x⁺ = filter(val -> val >0, x̃)

Store the frequency adjusted non-zero simulated precipitation in a DataFrame

results = DataFrame(Date = sim.Date[sim.Precipitation.>u])
results[:, :RAW] =  x⁺
first(results,5)
5×2 DataFrame
RowDateRAW
DateFloat64
11955-05-1011.9656
21955-05-112.37501
31955-05-135.1358
41955-05-142.80354
51955-05-181.42283

Empirical Quantile Matching

Model definition

Defining the model:

julia> qmm = EmpiricalQuantileMatchingModel(y⁺, x⁺)EmpiricalQuantileMatchingModel{Stationary}
    targetsample::Vector{Float64}[4105] = [0.2,...,81.9]
    actualsample::Vector{Float64}[4178] = [0.0017,...,102.7605]
    nbins: 4178
    extrapolation: Flat()

Quantile matching

Quantile matching of the non-zero simulated precipitations:

x̃⁺ = match.(qmm, x⁺)

Comparison with the observations

Comparing the post-processed simulated precipitation distribution with the one for the observations

QuantileMatching.qqplot(y⁺, x̃⁺)
Example block output

Fill in with dry days

Adding the zeros from the frequency adjusted values:

x̃[x̃ .> 0] = x̃⁺

Store the corrected non-zero precipitation in the results DataFrame

results[:, :EQM] = x̃⁺
first(results,5)
5×3 DataFrame
RowDateRAWEQM
DateFloat64Float64
11955-05-1011.965611.0
21955-05-112.375012.3
31955-05-135.13584.8
41955-05-142.803542.6
51955-05-181.422831.3

Gamma Quantile Matching

Piani et al. (2010) propose to use the Gamma distribution to perform parametric quantile matching of precipitation. This parametric quantile matching method is referred to here as Gamma Parametric Quantile Matching (GPQM).

Model definition

Modelling the non-zero observed precipitations with the Gamma distribution:

julia> fd_Y = fit(Gamma, y⁺)Distributions.Gamma{Float64}(α=0.7295092921426495, θ=9.333188848435453)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(y⁺, fd_Y, 0, 50),
    QuantileMatching.qqplot(y⁺, fd_Y))
Example block output
Note

In this case, the Gamma distribution does not fit well the data. It is not surprising since precipitation is usually heavy-tailed but the Gamma distribution is light-tailed.

Modeling the non-zero simulated precipitations with the Gamma distribution:

julia> fd_X = fit(Gamma, x⁺)Distributions.Gamma{Float64}(α=0.6696322823377227, θ=10.943615623045797)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(x⁺, fd_X, 0, 50),
    QuantileMatching.qqplot(x⁺, fd_X))
Example block output

Defining the model:

julia> qmm = ParametricQuantileMatchingModel(fd_Y, fd_X)ParametricQuantileMatchingModel{Stationary}
    targetdist: Distributions.Gamma{Float64}(α=0.7295092921426495, θ=9.333188848435453)
    actualdist: Distributions.Gamma{Float64}(α=0.6696322823377227, θ=10.943615623045797)

Quantile matching

Quantile matching of the non-zero simulated precipitations:

x̃⁺ = match.(qmm, x⁺)

Comparison with the observations

Comparing the post-processed simulated precipitation distribution with the one for the observations

QuantileMatching.qqplot(y⁺, x̃⁺)
Example block output
Note

The corrected value distribution is not matching well the observation distribution. It is notably due to the fact that the Gamma distribution is light-tailed while precipitation is heavy-tailed.

Store the corrected non-zero precipitation in the results DataFrame

results[:, :GQM] = x̃⁺
first(results,5)
5×4 DataFrame
RowDateRAWEQMGQM
DateFloat64Float64Float64
11955-05-1011.965611.011.0904
21955-05-112.375012.32.40297
31955-05-135.13584.84.96291
41955-05-142.803542.62.80708
51955-05-181.422831.31.48924

Gamma-Generalized Pareto Quantile Matching

Gutjahr and Heinemann (2013) propose to use the Gamma distribution for the bulk and the generalized Pareto distribution for the right tail, i.e above the 95th quantile of non-zero precipitation. This parametric quantile matching method is referred to here as Gamma-Generalized Pareto Quantile Matching (GGPQM).

Model definition

Modelling the non-zero observed precipitations with the Gamma-Generalized Pareto distribution:

julia> # Threshold selection for the tail
       u = quantile(y⁺, .95)
       
       # Fit the Gamma distribution for the bulk24.8
julia> f₁ = truncated(fit(Gamma, y⁺), 0, u) # Fit the Generalized Pareto distribution for the right tailTruncated(Distributions.Gamma{Float64}(α=0.7295092921426495, θ=9.333188848435453); lower=0.0, upper=24.8)
julia> fd = Extremes.gpfit( y⁺[y⁺.>u] .-u )MaximumLikelihoodAbstractExtremeValueModel model : ThresholdExceedance data : Vector{Float64}[201] logscale : ϕ ~ 1 shape : ξ ~ 1 θ̂ : [2.500282369778531, -0.037218218094625825]
julia> f₂ = GeneralizedPareto(u, exp(fd.θ̂[1]), fd.θ̂[2]) # Combining the two distributionsDistributions.GeneralizedPareto{Float64}(μ=24.8, σ=12.185934414542345, ξ=-0.037218218094625825)
julia> fd_Y = MixtureModel([f₁, f₂], [.95, .05])MixtureModel{Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}}(K = 2) components[1] (prior = 0.9500): Truncated(Distributions.Gamma{Float64}(α=0.7295092921426495, θ=9.333188848435453); lower=0.0, upper=24.8) components[2] (prior = 0.0500): Distributions.GeneralizedPareto{Float64}(μ=24.8, σ=12.185934414542345, ξ=-0.037218218094625825)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(y⁺, fd_Y, 0, 50),
    QuantileMatching.qqplot(y⁺, fd_Y))
Example block output

Modelling the non-zero simulated precipitations with the Gamma-Generalized Pareto distribution:

julia> # Threshold selection for the tail
       u = quantile(x⁺, .95)
       
       # Fit the Gamma distribution for the bulk26.615014517307287
julia> f₁ = truncated(fit(Gamma, x⁺), 0, u) # Fit the Generalized Pareto distribution for the right tailTruncated(Distributions.Gamma{Float64}(α=0.6696322823377227, θ=10.943615623045797); lower=0.0, upper=26.615014517307287)
julia> fd = Extremes.gpfit( x⁺[x⁺.>u] .-u )MaximumLikelihoodAbstractExtremeValueModel model : ThresholdExceedance data : Vector{Float64}[209] logscale : ϕ ~ 1 shape : ξ ~ 1 θ̂ : [2.5718318366757327, -0.05638788031685149]
julia> f₂ = GeneralizedPareto(u, exp(fd.θ̂[1]), fd.θ̂[2]) # Combining the two distributionsDistributions.GeneralizedPareto{Float64}(μ=26.615014517307287, σ=13.089780832743658, ξ=-0.05638788031685149)
julia> fd_X = MixtureModel([f₁, f₂], [.95, .05])MixtureModel{Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}}(K = 2) components[1] (prior = 0.9500): Truncated(Distributions.Gamma{Float64}(α=0.6696322823377227, θ=10.943615623045797); lower=0.0, upper=26.615014517307287) components[2] (prior = 0.0500): Distributions.GeneralizedPareto{Float64}(μ=26.615014517307287, σ=13.089780832743658, ξ=-0.05638788031685149)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(x⁺, fd_X, 0, 50),
    QuantileMatching.qqplot(x⁺, fd_X))
Example block output

Defining the model:

julia> qmm = ParametricQuantileMatchingModel(fd_Y, fd_X)ParametricQuantileMatchingModel{Stationary}
    targetdist: MixtureModel{Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}}(K = 2)
components[1] (prior = 0.9500): Truncated(Distributions.Gamma{Float64}(α=0.7295092921426495, θ=9.333188848435453); lower=0.0, upper=24.8)
components[2] (prior = 0.0500): Distributions.GeneralizedPareto{Float64}(μ=24.8, σ=12.185934414542345, ξ=-0.037218218094625825)

    actualdist: MixtureModel{Distributions.Distribution{Distributions.Univariate, Distributions.Continuous}}(K = 2)
components[1] (prior = 0.9500): Truncated(Distributions.Gamma{Float64}(α=0.6696322823377227, θ=10.943615623045797); lower=0.0, upper=26.615014517307287)
components[2] (prior = 0.0500): Distributions.GeneralizedPareto{Float64}(μ=26.615014517307287, σ=13.089780832743658, ξ=-0.05638788031685149)

Quantile matching

Quantile matching of the non-zero simulated precipitations:

x̃⁺ = match.(qmm, x⁺)

Comparison with the observations

Comparing the post-processed simulated precipitation distribution with the one for the observations

QuantileMatching.qqplot(y⁺, x̃⁺)
Example block output

Store the corrected non-zero precipitation in the results DataFrame

results[:, :GGPQM] = x̃⁺
first(results,5)
5×5 DataFrame
RowDateRAWEQMGQMGGPQM
DateFloat64Float64Float64Float64
11955-05-1011.965611.011.090411.2319
21955-05-112.375012.32.402972.41975
31955-05-135.13584.84.962915.00396
41955-05-142.803542.62.807082.82719
51955-05-181.422831.31.489241.49904

Extended Generalized Pareto Quantile Matching

Gobeil et al. (2023, submitted) propose to use the extended Generalized Pareto distribution developed by Gamet and Jalbert (2022) to perform parametric quantile matching of precipitation. This parametric quantile matching method is referred to here as Extended Generalized Pareto Quantile Matching (EGPQM).

Model definition

Modelling the non-zero observed precipitations with the extended Generalized Pareto distribution:

julia> fd_Y = fit(ExtendedGeneralizedPareto{TBeta}, y⁺)ExtendedExtremes.ExtendedGeneralizedPareto{ExtendedExtremes.TBeta{Float64}}(
V: ExtendedExtremes.TBeta{Float64}(α=0.2600426944386879)
G: Distributions.GeneralizedPareto{Float64}(μ=0.0, σ=8.742467370007379, ξ=0.07293435088986122)
)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(y⁺, fd_Y, 0, 50),
    QuantileMatching.qqplot(y⁺, fd_Y))
Example block output

Modelling the non-zero simulated precipitations with the extended Generalized Pareto distribution:

julia> fd_X = fit(ExtendedGeneralizedPareto{TBeta}, x⁺)ExtendedExtremes.ExtendedGeneralizedPareto{ExtendedExtremes.TBeta{Float64}}(
V: ExtendedExtremes.TBeta{Float64}(α=0.17021521793893313)
G: Distributions.GeneralizedPareto{Float64}(μ=0.0, σ=10.036043780211887, ξ=0.04700069268683122)
)

Goodness-of-fit of the model:

plot(QuantileMatching.histplot(x⁺, fd_X, 0, 50),
    QuantileMatching.qqplot(x⁺, fd_X))
Example block output

Defining the model:

julia> qmm = ParametricQuantileMatchingModel(fd_Y, fd_X)ParametricQuantileMatchingModel{Stationary}
    targetdist: ExtendedExtremes.ExtendedGeneralizedPareto{ExtendedExtremes.TBeta{Float64}}(
V: ExtendedExtremes.TBeta{Float64}(α=0.2600426944386879)
G: Distributions.GeneralizedPareto{Float64}(μ=0.0, σ=8.742467370007379, ξ=0.07293435088986122)
)

    actualdist: ExtendedExtremes.ExtendedGeneralizedPareto{ExtendedExtremes.TBeta{Float64}}(
V: ExtendedExtremes.TBeta{Float64}(α=0.17021521793893313)
G: Distributions.GeneralizedPareto{Float64}(μ=0.0, σ=10.036043780211887, ξ=0.04700069268683122)
)

Quantile matching

Quantile matching of the non-zero simulated precipitations:

x̃⁺ = match.(qmm, x⁺)

Comparison with the observations

Comparing the post-processed simulated precipitation distribution with the one for the observations

QuantileMatching.qqplot(y⁺, x̃⁺)
Example block output
results[:, :EGPQM] = x̃⁺
first(results,5)
5×6 DataFrame
RowDateRAWEQMGQMGGPQMEGPQM
DateFloat64Float64Float64Float64Float64
11955-05-1011.965611.011.090411.231911.1357
21955-05-112.375012.32.402972.419752.31107
31955-05-135.13584.84.962915.003964.89338
41955-05-142.803542.62.807082.827192.71734
51955-05-181.422831.31.489241.499041.39763

Comparison of quantile matched data

v = [y⁺]

for col in [:RAW, :EQM, :GQM, :GGPQM, :EGPQM]
    push!(v, results[:,col])
end

df_indices = DataFrame(Index=String[],
    Observations=Float64[],
    RAW=Float64[],
    EQM=Float64[],
    GQM=Float64[],
    GGPQM=Float64[],
    EGPQM=Float64[])

push!(df_indices, ["SDII", wet_mean.(v)...])
push!(df_indices, ["skewness", skewness.(v)...])
push!(df_indices, ["R10", pwet.(v,10)...])
push!(df_indices, ["p90wet", wet_quantile.(v, .9, 1)...])
push!(df_indices, ["p98wet", wet_quantile.(v, .98, 1)...])
push!(df_indices, ["p98wetamountmean", wet_mean.(v, wet_quantile.(v, .98, 1))...])
6×7 DataFrame
RowIndexObservationsRAWEQMGQMGGPQMEGPQM
StringFloat64Float64Float64Float64Float64Float64
1SDII6.808657.32826.807736.796996.963956.87382
2skewness2.781322.705462.780742.604072.784262.81548
3R100.2153470.2436570.220440.2288180.231450.229296
4p90wet20.621.512820.097919.400519.858219.9086
5p98wet38.641.252238.063236.553238.479338.5494
6p98wetamountmean50.418652.410349.493446.119449.241449.5585
obs[:,:Year] = year.(obs.Date)

df_annualmaxima_obs = combine(groupby(obs, :Year), :Precipitation => maximum => :Observation)

results[:,:Year] = year.(results.Date)

df_annualmaxima_sim = combine(groupby(results, :Year), :RAW => maximum => :RAW)

df_annualmaxima_sim = combine(groupby(results, :Year),
    [col => maximum => col for col in [:RAW, :EQM, :GQM, :GGPQM, :EGPQM]])

df_annualmaxima = innerjoin(df_annualmaxima_obs, df_annualmaxima_sim, on=:Year)

RV20 = Float64[]
RV100 = Float64[]

for col in [:Observation, :RAW, :EQM, :GQM, :GGPQM, :EGPQM]
    fittedGEV = gevfit(df_annualmaxima[:,col])
    push!(RV20, returnlevel(fittedGEV, 20).value[])
    push!(RV100, returnlevel(fittedGEV, 100).value[])
end

push!(df_indices, ["RV20", RV20...])
push!(df_indices, ["RV100", RV100...])
8×7 DataFrame
RowIndexObservationsRAWEQMGQMGGPQMEGPQM
StringFloat64Float64Float64Float64Float64Float64
1SDII6.808657.32826.807736.796996.963956.87382
2skewness2.781322.705462.780742.604072.784262.81548
3R100.2153470.2436570.220440.2288180.231450.229296
4p90wet20.621.512820.097919.400519.858219.9086
5p98wet38.641.252238.063236.553238.479338.5494
6p98wetamountmean50.418652.410349.493446.119449.241449.5585
7RV2073.999578.103173.938668.408474.740175.4223
8RV10095.998594.629188.750482.572691.717692.952