As part of the World Winter Cycling Congress in Calgary, I’m presenting a paper entitled “Wondering What’s Winter Weather”, which is a data-driven look at what weather factors impact cycling. This is the data dump / methodology version of the post; I’ll produce another in a few days that is words and brightly-coloured charts, so if you don’t want the gory guts, follow the blog and you’ll see it pop up pretty soon.
Here’s my slide deck.K_Stefan_WBC_05
We are lucky enough here in Calgary that the weather is less than perfectly correlated with the season — instead of winter being a constant -10 and snowy, the high temperature can be +12 like it was the weekend before last… or -25 like it was yesterday. That means it’s easier to tease apart different weather factors, and to try to separate weather effects from seasonal effects.
I combined 4 years of data from the cycle counters at the Peace Bridge and the 5th St underpass with weather and estimated four models for different “segments”. (Thanks to Chealion’s GitHub which helped immensely with extracting the count data.) Each segment represents cyclists in different time periods – detailed at the top of each model below. These are log-linear regressions; ie linear regressions where the dependent variable is ln(number of cyclists) — this transformation was done when building the datsset. This has a number of advantages; the model can’t go negative (and neither can the counts), and the results are elasticities – so instead of rain resulting in 100 fewer cyclists, the model will say rain results in 10% fewer.
The elasticities for each parameter are calculated by using exp(parameter)-1, but when they’re close to 0 (as most of them are), you can get an approximate sense of what they are by reading them directly. For instance, the parameter for snow falling on a day for the peak segment is -0.06856; this means that snow falling results in a reduction in cyclists of: (e^-0.06856) – 1 = -0.0663, which is a 6.63% reduction (and pretty close to the 0.06856 parameter).
The full model estimations are below; there’s one for each segment. The parameters are:
- LocPeace – Dummy for all Peace Bridge counts; effectively establishes two intercepts, one for 5th St and one for the Peace Bridge, so a “typical” day (+10 degrees, dry, no wind, in April) can have different average numbers of cyclists at each location.
- SnowFell – Binary, 1 if snow fell on the day
- SnowAmt – Amount of snow that fell (in cm)
- SnowExists – Binary, 1 if any snow on the ground
- SnowDepth – Depth of snow on ground (in cm)
- RainExists – Binary, 1 if any rain fell that day
- RainAmt – Amount of rain that fell (in cm)
- TmaxT etc: Nonlinear function of temperature (in Celsius). TmaxT is the daily maximum temperature minus 10 degrees (ie the function is relative to a 10 degree day); P and N represent additional parameters when TmaxT is positive or negative (relative to 10 degrees) respectively; the final number is the power that TmaxT is raised to. So if the daily high was 7 degrees, TmaxT is -3 and the temperature function is:
TmaxT * -3 + TmaxTN2 * -3^2 + TmaxTN3 * -3^3
- Eve_Daylight – Number of hours of daylight in the evening
- MonthNameAug to MonthNameSep – Binary for each month
- IsFri – Binary, 1 for Fridays
- PeaceConst – Binary, 1 for Peace Bridge in 2018 when construction meant detours on pathway in area
- YearVal – Used to capture growth; number of years since January 1, 2015 (ie this is the annual growth rate)
- LocPeace:YearVal – Second growth rate for Peace Bridge only; this permits different growth rates at each location
This segment is cyclists travelling into downtown between 7 and 9 AM, or out of downtown between 4 and 6 PM on weekdays (excluding holidays.) I think it’s mostly commuters; it’s the least sensitive to weather conditions overall.
Call: lm(formula = LnPeak ~ Loc + SnowFell + SnowDepth + SnowExists + SnowAmt + RainAmt + YearVal + TmaxT + TmaxTP3 + TmaxTN2 + TmaxTN3 + YearVal * Loc + PeaceConst + MonthName + IsFri, data = Bike2) Residuals: Min 1Q Median 3Q Max -4.4931 -0.1288 0.0196 0.1621 1.1117 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.912e+00 4.283e-02 138.049 < 2e-16 *** LocPeace 4.151e-01 3.410e-02 12.172 < 2e-16 *** SnowFell -6.856e-02 2.920e-02 -2.348 0.018975 * SnowDepth -1.889e-02 3.251e-03 -5.809 7.45e-09 *** SnowExists -2.292e-01 3.216e-02 -7.127 1.51e-12 *** SnowAmt -3.317e-02 5.618e-03 -5.904 4.27e-09 *** RainAmt -2.437e-02 2.543e-03 -9.583 < 2e-16 *** YearVal 5.422e-02 1.275e-02 4.253 2.22e-05 *** TmaxT 4.002e-02 3.042e-03 13.155 < 2e-16 *** TmaxTP3 -4.129e-05 7.359e-06 -5.610 2.36e-08 *** TmaxTN2 2.882e-03 4.385e-04 6.574 6.49e-11 *** TmaxTN3 1.037e-04 1.361e-05 7.615 4.32e-14 *** PeaceConst -1.137e+00 4.110e-02 -27.656 < 2e-16 *** MonthNameAug 8.422e-02 4.375e-02 1.925 0.054400 . MonthNameDec -2.197e-01 5.174e-02 -4.246 2.29e-05 *** MonthNameFeb -2.971e-01 4.487e-02 -6.621 4.76e-11 *** MonthNameJan -3.409e-01 4.791e-02 -7.116 1.63e-12 *** MonthNameJul 6.943e-02 4.570e-02 1.519 0.128915 MonthNameJun 2.569e-01 4.244e-02 6.053 1.74e-09 *** MonthNameMar -2.279e-01 4.078e-02 -5.589 2.65e-08 *** MonthNameMay 1.434e-01 4.018e-02 3.569 0.000369 *** MonthNameNov 6.086e-02 4.268e-02 1.426 0.154048 MonthNameOct 1.219e-01 3.977e-02 3.066 0.002206 ** MonthNameSep 1.948e-01 3.970e-02 4.906 1.02e-06 *** IsFri -2.973e-01 2.037e-02 -14.594 < 2e-16 *** LocPeace:YearVal 1.779e-02 1.766e-02 1.008 0.313781 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3379 on 1723 degrees of freedom (716 observations deleted due to missingness) Multiple R-squared: 0.8233, Adjusted R-squared: 0.8208 F-statistic: 321.2 on 25 and 1723 DF, p-value: < 2.2e-16
This segment is cyclists between 11 AM and 3 PM on weekdays (excluding holidays); it’s an interesting one — relatively insensitive to most weather conditions except temperature. I don’t have a lazy shorthand to describe this segment; it’s a mix of commuters, couriers, errand runners, leisure cyclists…
Call: lm(formula = LnMid ~ Loc + SnowFell + SnowDepth + SnowExists + SnowAmt + RainAmt + Gust + YearVal + TmaxT + TmaxTP3 + TmaxTN2 + TmaxTN3 + YearVal * Loc + PeaceConst + MonthName + IsFri, data = Bike2) Residuals: Min 1Q Median 3Q Max -3.2184 -0.1956 0.0046 0.2247 1.6715 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.161e+00 4.435e-02 116.355 < 2e-16 *** LocPeace -3.326e-01 3.491e-02 -9.527 < 2e-16 *** SnowFell -1.056e-01 2.987e-02 -3.536 0.000417 *** SnowDepth -2.623e-02 3.322e-03 -7.895 5.11e-15 *** SnowExists -1.539e-01 3.287e-02 -4.683 3.05e-06 *** SnowAmt -2.542e-02 5.739e-03 -4.430 1.00e-05 *** RainAmt -3.022e-02 2.618e-03 -11.540 < 2e-16 *** Gust -2.336e-03 7.289e-04 -3.205 0.001375 ** YearVal 9.518e-03 1.302e-02 0.731 0.465013 TmaxT 6.708e-02 3.129e-03 21.438 < 2e-16 *** TmaxTP3 -6.517e-05 7.550e-06 -8.632 < 2e-16 *** TmaxTN2 2.436e-03 4.486e-04 5.430 6.43e-08 *** TmaxTN3 4.176e-05 1.393e-05 2.997 0.002769 ** PeaceConst -1.103e+00 4.207e-02 -26.230 < 2e-16 *** MonthNameAug 3.440e-02 4.479e-02 0.768 0.442639 MonthNameDec -1.937e-01 5.306e-02 -3.651 0.000269 *** MonthNameFeb -2.658e-01 4.596e-02 -5.783 8.69e-09 *** MonthNameJan -2.735e-01 4.911e-02 -5.570 2.96e-08 *** MonthNameJul 8.873e-02 4.675e-02 1.898 0.057884 . MonthNameJun 9.170e-02 4.345e-02 2.111 0.034955 * MonthNameMar -1.387e-01 4.174e-02 -3.322 0.000913 *** MonthNameMay 1.346e-02 4.114e-02 0.327 0.743522 MonthNameNov 1.404e-02 4.370e-02 0.321 0.747995 MonthNameOct 6.396e-02 4.071e-02 1.571 0.116362 MonthNameSep 5.745e-02 4.079e-02 1.408 0.159211 IsFri 1.426e-01 2.081e-02 6.851 1.02e-11 *** LocPeace:YearVal 1.004e-01 1.808e-02 5.552 3.26e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3452 on 1721 degrees of freedom (717 observations deleted due to missingness) Multiple R-squared: 0.8581, Adjusted R-squared: 0.8559 F-statistic: 400.2 on 26 and 1721 DF, p-value: < 2.2e-16
Cyclists travelling between 7 PM and midnight on weekdays (excluding holidays); this is clearly leisure cyclists.
Call: lm(formula = LnEve ~ Loc + SnowFell + SnowDepth + SnowExists + SnowAmt + RainAmt + Gust + RainExists + YearVal + TmaxT + TmaxTP3 + TmaxTN2 + Eve_Daylight + YearVal * Loc + PeaceConst + MonthName + DayName, data = Bike2) Residuals: Min 1Q Median 3Q Max -4.3100 -0.1933 0.0063 0.2152 1.7274 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.357e+00 9.464e-02 46.034 < 2e-16 *** LocPeace 1.404e-01 3.734e-02 3.759 0.000176 *** SnowFell -1.374e-01 3.238e-02 -4.243 2.32e-05 *** SnowDepth -2.995e-02 3.550e-03 -8.436 < 2e-16 *** SnowExists -2.487e-01 3.523e-02 -7.060 2.41e-12 *** SnowAmt -3.444e-02 6.142e-03 -5.607 2.39e-08 *** RainAmt -3.578e-02 3.163e-03 -11.311 < 2e-16 *** Gust -7.106e-03 7.842e-04 -9.062 < 2e-16 *** RainExists -1.663e-01 3.236e-02 -5.139 3.08e-07 *** YearVal 6.019e-03 1.394e-02 0.432 0.665904 TmaxT 6.135e-02 2.932e-03 20.924 < 2e-16 *** TmaxTP3 -3.604e-05 7.559e-06 -4.767 2.02e-06 *** TmaxTN2 1.007e-03 1.295e-04 7.781 1.24e-14 *** Eve_Daylight 1.116e-01 3.213e-02 3.474 0.000525 *** PeaceConst -9.999e-01 4.497e-02 -22.233 < 2e-16 *** MonthNameAug 2.540e-02 4.867e-02 0.522 0.601779 MonthNameDec -3.949e-01 9.544e-02 -4.138 3.67e-05 *** MonthNameFeb -5.334e-01 9.123e-02 -5.847 5.98e-09 *** MonthNameJan -3.811e-01 9.420e-02 -4.046 5.44e-05 *** MonthNameJul 1.814e-01 5.923e-02 3.063 0.002229 ** MonthNameJun 2.116e-01 5.991e-02 3.533 0.000422 *** MonthNameMar -2.558e-01 5.772e-02 -4.432 9.94e-06 *** MonthNameMay 6.708e-02 4.943e-02 1.357 0.174954 MonthNameNov -1.253e-01 9.269e-02 -1.352 0.176494 MonthNameOct -5.726e-02 7.439e-02 -0.770 0.441571 MonthNameSep 1.004e-01 5.192e-02 1.933 0.053387 . DayName3_Tue 6.756e-02 2.851e-02 2.370 0.017916 * DayName4_Wed 1.080e-01 2.855e-02 3.783 0.000160 *** DayName5_Thu 1.237e-01 2.851e-02 4.341 1.50e-05 *** DayName6_Fri 6.581e-02 2.882e-02 2.283 0.022547 * LocPeace:YearVal 8.305e-02 1.933e-02 4.296 1.83e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3694 on 1718 degrees of freedom (716 observations deleted due to missingness) Multiple R-squared: 0.8942, Adjusted R-squared: 0.8924 F-statistic: 484.1 on 30 and 1718 DF, p-value: < 2.2e-16
Cyclists travelling on weekend afternoons (excluding holidays) 1 to 7 PM. Also leisure cyclists; they are generally a little less sensitive to weather than evening cyclists, except for temperature. I suspect more of these are suburban residents out for a cycle relative to evening cyclists, who are probably more local inner city residents.
Call: lm(formula = LnWE ~ Loc + SnowFell + SnowDepth + SnowAmt + RainAmt + Gust + RainExists + YearVal + TmaxT + TmaxTP3 + TmaxTN3 + YearVal * Loc + PeaceConst + MonthName, data = Bike2) Residuals: Min 1Q Median 3Q Max -2.64757 -0.18733 0.01455 0.22306 1.08863 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.414e+00 8.319e-02 65.078 < 2e-16 *** LocPeace 5.495e-01 6.787e-02 8.095 2.58e-15 *** SnowFell -1.984e-01 6.103e-02 -3.251 0.001205 ** SnowDepth -2.964e-02 6.297e-03 -4.708 3.03e-06 *** SnowAmt -7.645e-02 1.436e-02 -5.325 1.37e-07 *** RainAmt -2.812e-02 6.393e-03 -4.399 1.26e-05 *** Gust -7.440e-03 1.433e-03 -5.192 2.74e-07 *** RainExists -2.065e-01 5.402e-02 -3.822 0.000144 *** YearVal -1.731e-02 2.524e-02 -0.686 0.493117 TmaxT 1.120e-01 4.897e-03 22.874 < 2e-16 *** TmaxTP3 -1.328e-04 1.506e-05 -8.816 < 2e-16 *** TmaxTN3 -4.936e-05 7.465e-06 -6.612 7.59e-11 *** PeaceConst -1.091e+00 8.298e-02 -13.146 < 2e-16 *** MonthNameAug -1.972e-01 8.454e-02 -2.332 0.019968 * MonthNameDec -5.496e-01 1.035e-01 -5.313 1.46e-07 *** MonthNameFeb -2.734e-01 8.823e-02 -3.098 0.002024 ** MonthNameJan -4.928e-01 9.093e-02 -5.420 8.26e-08 *** MonthNameJul -1.690e-01 9.007e-02 -1.876 0.061055 . MonthNameJun -6.173e-02 8.377e-02 -0.737 0.461408 MonthNameMar -1.205e-02 8.055e-02 -0.150 0.881151 MonthNameMay 2.303e-02 7.819e-02 0.295 0.768388 MonthNameNov -3.124e-01 8.198e-02 -3.810 0.000151 *** MonthNameOct -1.817e-01 7.698e-02 -2.360 0.018537 * MonthNameSep 4.787e-03 7.562e-02 0.063 0.949543 LocPeace:YearVal 1.029e-01 3.540e-02 2.908 0.003759 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.428 on 691 degrees of freedom (1749 observations deleted due to missingness) Multiple R-squared: 0.8974, Adjusted R-squared: 0.8938 F-statistic: 251.8 on 24 and 691 DF, p-value: < 2.2e-16
As I said, I intend to write a verbose blog post in the next few days with graphs and charts that are more decipherable for the general audience, so stay tuned for that.
The photo at the top is CC BY-SA 2.0 from Bike Calgary.
3 thoughts on “How weather affects cycling – methodology”
Did you consider a poisson or negative binomial regression? A log-linear transformation is biased for count data because it assumes normality in the error terms, whereas count data does not generally exhibit normal errors.
The short answer is I didn’t think as much as I should about what distribution to use, and you raise a good point. The longer answer is that if I recall correctly, Poisson tends towards normal when the mean is relatively high, and the means here are high. (It’s ironic I didn’t describe the counts in terms of mean or median, given the blog name!) The least common is the Evening, with a mean of 154 and a median of 107; the other extreme is Peak, where the mean count is 543 and the median 510. Midday has a mean of 200 and a median of 179; Weekend has a mean of 423 and median of 315 in the dataset I used. Only about 1.5% of observations were less than 10, which is where Poisson starts really diverging from normal.