How weather affects cycling – methodology

As part of the World Winter Cycling Congress in Calgary, I’m presenting a paper entitled “Wondering What’s Winter Weather”, which is a data-driven look at what weather factors impact cycling. This is the data dump / methodology version of the post; I’ll produce another in a few days that is words and brightly-coloured charts, so if you don’t want the gory guts, follow the blog and you’ll see it pop up pretty soon.

Here’s my slide deck.K_Stefan_WBC_05

We are lucky enough here in Calgary that the weather is less than perfectly correlated with the season — instead of winter being a constant -10 and snowy, the high temperature can be +12 like it was the weekend before last… or -25 like it was yesterday. That means it’s easier to tease apart different weather factors, and to try to separate weather effects from seasonal effects.

Model design

I combined 4 years of data from the cycle counters at the Peace Bridge and the 5th St underpass with weather and estimated four models for different “segments”. (Thanks to Chealion’s GitHub which helped immensely with extracting the count data.) Each segment represents cyclists in different time periods – detailed at the top of each model below.  These are log-linear regressions; ie linear regressions where the dependent variable is ln(number of cyclists) — this transformation was done when building the datsset. This has a number of advantages; the model can’t go negative (and neither can the counts), and the results are elasticities – so instead of rain resulting in 100 fewer cyclists, the model will say rain results in 10% fewer.

The elasticities for each parameter are calculated by using exp(parameter)-1, but when they’re close to 0 (as most of them are), you can get an approximate sense of what they are by reading them directly. For instance, the parameter for snow falling on a day for the peak segment is -0.06856; this means that snow falling results in a reduction in cyclists of: (e^-0.06856) – 1 = -0.0663, which is a 6.63% reduction (and pretty close to the 0.06856 parameter).

Parameters

The full model estimations are below; there’s one for each segment. The parameters are:

  • LocPeace – Dummy for all Peace Bridge counts; effectively establishes two intercepts, one for 5th St and one for the Peace Bridge, so a “typical” day (+10 degrees, dry, no wind, in April) can have different average numbers of cyclists at each location.
  • SnowFell – Binary, 1 if snow fell on the day
  • SnowAmt – Amount of snow that fell (in cm)
  • SnowExists – Binary, 1 if any snow on the ground
  • SnowDepth – Depth of snow on ground (in cm)
  • RainExists – Binary, 1 if any rain fell that day
  • RainAmt  – Amount of rain that fell (in cm)
  • TmaxT etc: Nonlinear function of temperature (in Celsius). TmaxT is the daily maximum temperature minus 10 degrees (ie the function is relative to a 10 degree day); P and N represent additional parameters when TmaxT is positive or negative (relative to 10 degrees) respectively; the final number is the power that TmaxT is raised to. So if the daily high was 7 degrees, TmaxT is -3 and the temperature function is:
 TmaxT * -3 + TmaxTN2 * -3^2 + TmaxTN3 * -3^3
  • Eve_Daylight – Number of hours of daylight in the evening
  • MonthNameAug to MonthNameSep – Binary for each month
  • IsFri – Binary, 1 for Fridays
  • PeaceConst – Binary, 1 for Peace Bridge in 2018 when construction meant detours on pathway in area
  • YearVal – Used to capture growth; number of years since January 1, 2015 (ie this is the annual growth rate)
  • LocPeace:YearVal – Second growth rate for Peace Bridge only; this permits different growth rates at each location

Peak Cyclists

This segment is cyclists travelling into downtown between 7 and 9 AM, or out of downtown between 4 and 6 PM on weekdays (excluding holidays.) I think it’s mostly commuters; it’s the least sensitive to weather conditions overall.

Call:
lm(formula = LnPeak ~ Loc + SnowFell + SnowDepth + SnowExists + 
    SnowAmt + RainAmt + YearVal + TmaxT + TmaxTP3 + TmaxTN2 + 
    TmaxTN3 + YearVal * Loc + PeaceConst + MonthName + IsFri, 
    data = Bike2)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4931 -0.1288  0.0196  0.1621  1.1117 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.912e+00  4.283e-02 138.049  < 2e-16 ***
LocPeace          4.151e-01  3.410e-02  12.172  < 2e-16 ***
SnowFell         -6.856e-02  2.920e-02  -2.348 0.018975 *  
SnowDepth        -1.889e-02  3.251e-03  -5.809 7.45e-09 ***
SnowExists       -2.292e-01  3.216e-02  -7.127 1.51e-12 ***
SnowAmt          -3.317e-02  5.618e-03  -5.904 4.27e-09 ***
RainAmt          -2.437e-02  2.543e-03  -9.583  < 2e-16 ***
YearVal           5.422e-02  1.275e-02   4.253 2.22e-05 ***
TmaxT             4.002e-02  3.042e-03  13.155  < 2e-16 ***
TmaxTP3          -4.129e-05  7.359e-06  -5.610 2.36e-08 ***
TmaxTN2           2.882e-03  4.385e-04   6.574 6.49e-11 ***
TmaxTN3           1.037e-04  1.361e-05   7.615 4.32e-14 ***
PeaceConst       -1.137e+00  4.110e-02 -27.656  < 2e-16 ***
MonthNameAug      8.422e-02  4.375e-02   1.925 0.054400 .  
MonthNameDec     -2.197e-01  5.174e-02  -4.246 2.29e-05 ***
MonthNameFeb     -2.971e-01  4.487e-02  -6.621 4.76e-11 ***
MonthNameJan     -3.409e-01  4.791e-02  -7.116 1.63e-12 ***
MonthNameJul      6.943e-02  4.570e-02   1.519 0.128915    
MonthNameJun      2.569e-01  4.244e-02   6.053 1.74e-09 ***
MonthNameMar     -2.279e-01  4.078e-02  -5.589 2.65e-08 ***
MonthNameMay      1.434e-01  4.018e-02   3.569 0.000369 ***
MonthNameNov      6.086e-02  4.268e-02   1.426 0.154048    
MonthNameOct      1.219e-01  3.977e-02   3.066 0.002206 ** 
MonthNameSep      1.948e-01  3.970e-02   4.906 1.02e-06 ***
IsFri            -2.973e-01  2.037e-02 -14.594  < 2e-16 ***
LocPeace:YearVal  1.779e-02  1.766e-02   1.008 0.313781    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3379 on 1723 degrees of freedom
  (716 observations deleted due to missingness)
Multiple R-squared:  0.8233,	Adjusted R-squared:  0.8208 
F-statistic: 321.2 on 25 and 1723 DF,  p-value: < 2.2e-16

 

Midday cyclists

This segment is cyclists between 11 AM and 3 PM on weekdays (excluding holidays); it’s an interesting one — relatively insensitive to most weather conditions except temperature. I don’t have a lazy shorthand to describe this segment; it’s a mix of commuters, couriers, errand runners, leisure cyclists…

Call:
lm(formula = LnMid ~ Loc + SnowFell + SnowDepth + SnowExists + 
    SnowAmt + RainAmt + Gust + YearVal + TmaxT + TmaxTP3 + TmaxTN2 + 
    TmaxTN3 + YearVal * Loc + PeaceConst + MonthName + IsFri, 
    data = Bike2)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2184 -0.1956  0.0046  0.2247  1.6715 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.161e+00  4.435e-02 116.355  < 2e-16 ***
LocPeace         -3.326e-01  3.491e-02  -9.527  < 2e-16 ***
SnowFell         -1.056e-01  2.987e-02  -3.536 0.000417 ***
SnowDepth        -2.623e-02  3.322e-03  -7.895 5.11e-15 ***
SnowExists       -1.539e-01  3.287e-02  -4.683 3.05e-06 ***
SnowAmt          -2.542e-02  5.739e-03  -4.430 1.00e-05 ***
RainAmt          -3.022e-02  2.618e-03 -11.540  < 2e-16 ***
Gust             -2.336e-03  7.289e-04  -3.205 0.001375 ** 
YearVal           9.518e-03  1.302e-02   0.731 0.465013    
TmaxT             6.708e-02  3.129e-03  21.438  < 2e-16 ***
TmaxTP3          -6.517e-05  7.550e-06  -8.632  < 2e-16 ***
TmaxTN2           2.436e-03  4.486e-04   5.430 6.43e-08 ***
TmaxTN3           4.176e-05  1.393e-05   2.997 0.002769 ** 
PeaceConst       -1.103e+00  4.207e-02 -26.230  < 2e-16 ***
MonthNameAug      3.440e-02  4.479e-02   0.768 0.442639    
MonthNameDec     -1.937e-01  5.306e-02  -3.651 0.000269 ***
MonthNameFeb     -2.658e-01  4.596e-02  -5.783 8.69e-09 ***
MonthNameJan     -2.735e-01  4.911e-02  -5.570 2.96e-08 ***
MonthNameJul      8.873e-02  4.675e-02   1.898 0.057884 .  
MonthNameJun      9.170e-02  4.345e-02   2.111 0.034955 *  
MonthNameMar     -1.387e-01  4.174e-02  -3.322 0.000913 ***
MonthNameMay      1.346e-02  4.114e-02   0.327 0.743522    
MonthNameNov      1.404e-02  4.370e-02   0.321 0.747995    
MonthNameOct      6.396e-02  4.071e-02   1.571 0.116362    
MonthNameSep      5.745e-02  4.079e-02   1.408 0.159211    
IsFri             1.426e-01  2.081e-02   6.851 1.02e-11 ***
LocPeace:YearVal  1.004e-01  1.808e-02   5.552 3.26e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3452 on 1721 degrees of freedom
  (717 observations deleted due to missingness)
Multiple R-squared:  0.8581,	Adjusted R-squared:  0.8559 
F-statistic: 400.2 on 26 and 1721 DF,  p-value: < 2.2e-16

 

Evening cyclists

Cyclists travelling between 7 PM and midnight on weekdays (excluding holidays); this is clearly leisure cyclists.

Call:
lm(formula = LnEve ~ Loc + SnowFell + SnowDepth + SnowExists + 
    SnowAmt + RainAmt + Gust + RainExists + YearVal + TmaxT + 
    TmaxTP3 + TmaxTN2 + Eve_Daylight + YearVal * Loc + PeaceConst + 
    MonthName + DayName, data = Bike2)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.3100 -0.1933  0.0063  0.2152  1.7274 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       4.357e+00  9.464e-02  46.034  < 2e-16 ***
LocPeace          1.404e-01  3.734e-02   3.759 0.000176 ***
SnowFell         -1.374e-01  3.238e-02  -4.243 2.32e-05 ***
SnowDepth        -2.995e-02  3.550e-03  -8.436  < 2e-16 ***
SnowExists       -2.487e-01  3.523e-02  -7.060 2.41e-12 ***
SnowAmt          -3.444e-02  6.142e-03  -5.607 2.39e-08 ***
RainAmt          -3.578e-02  3.163e-03 -11.311  < 2e-16 ***
Gust             -7.106e-03  7.842e-04  -9.062  < 2e-16 ***
RainExists       -1.663e-01  3.236e-02  -5.139 3.08e-07 ***
YearVal           6.019e-03  1.394e-02   0.432 0.665904    
TmaxT             6.135e-02  2.932e-03  20.924  < 2e-16 ***
TmaxTP3          -3.604e-05  7.559e-06  -4.767 2.02e-06 ***
TmaxTN2           1.007e-03  1.295e-04   7.781 1.24e-14 ***
Eve_Daylight      1.116e-01  3.213e-02   3.474 0.000525 ***
PeaceConst       -9.999e-01  4.497e-02 -22.233  < 2e-16 ***
MonthNameAug      2.540e-02  4.867e-02   0.522 0.601779    
MonthNameDec     -3.949e-01  9.544e-02  -4.138 3.67e-05 ***
MonthNameFeb     -5.334e-01  9.123e-02  -5.847 5.98e-09 ***
MonthNameJan     -3.811e-01  9.420e-02  -4.046 5.44e-05 ***
MonthNameJul      1.814e-01  5.923e-02   3.063 0.002229 ** 
MonthNameJun      2.116e-01  5.991e-02   3.533 0.000422 ***
MonthNameMar     -2.558e-01  5.772e-02  -4.432 9.94e-06 ***
MonthNameMay      6.708e-02  4.943e-02   1.357 0.174954    
MonthNameNov     -1.253e-01  9.269e-02  -1.352 0.176494    
MonthNameOct     -5.726e-02  7.439e-02  -0.770 0.441571    
MonthNameSep      1.004e-01  5.192e-02   1.933 0.053387 .  
DayName3_Tue      6.756e-02  2.851e-02   2.370 0.017916 *  
DayName4_Wed      1.080e-01  2.855e-02   3.783 0.000160 ***
DayName5_Thu      1.237e-01  2.851e-02   4.341 1.50e-05 ***
DayName6_Fri      6.581e-02  2.882e-02   2.283 0.022547 *  
LocPeace:YearVal  8.305e-02  1.933e-02   4.296 1.83e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3694 on 1718 degrees of freedom
  (716 observations deleted due to missingness)
Multiple R-squared:  0.8942,	Adjusted R-squared:  0.8924 
F-statistic: 484.1 on 30 and 1718 DF,  p-value: < 2.2e-16

Weekend cyclists

Cyclists travelling on weekend afternoons (excluding holidays) 1 to 7 PM. Also leisure cyclists; they are generally a little less sensitive to weather than evening cyclists, except for temperature. I suspect more of these are suburban residents out for a cycle relative to evening cyclists, who are probably more local inner city residents.

Call:
lm(formula = LnWE ~ Loc + SnowFell + SnowDepth + SnowAmt + RainAmt + 
    Gust + RainExists + YearVal + TmaxT + TmaxTP3 + TmaxTN3 + 
    YearVal * Loc + PeaceConst + MonthName, data = Bike2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.64757 -0.18733  0.01455  0.22306  1.08863 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.414e+00  8.319e-02  65.078  < 2e-16 ***
LocPeace          5.495e-01  6.787e-02   8.095 2.58e-15 ***
SnowFell         -1.984e-01  6.103e-02  -3.251 0.001205 ** 
SnowDepth        -2.964e-02  6.297e-03  -4.708 3.03e-06 ***
SnowAmt          -7.645e-02  1.436e-02  -5.325 1.37e-07 ***
RainAmt          -2.812e-02  6.393e-03  -4.399 1.26e-05 ***
Gust             -7.440e-03  1.433e-03  -5.192 2.74e-07 ***
RainExists       -2.065e-01  5.402e-02  -3.822 0.000144 ***
YearVal          -1.731e-02  2.524e-02  -0.686 0.493117    
TmaxT             1.120e-01  4.897e-03  22.874  < 2e-16 ***
TmaxTP3          -1.328e-04  1.506e-05  -8.816  < 2e-16 ***
TmaxTN3          -4.936e-05  7.465e-06  -6.612 7.59e-11 ***
PeaceConst       -1.091e+00  8.298e-02 -13.146  < 2e-16 ***
MonthNameAug     -1.972e-01  8.454e-02  -2.332 0.019968 *  
MonthNameDec     -5.496e-01  1.035e-01  -5.313 1.46e-07 ***
MonthNameFeb     -2.734e-01  8.823e-02  -3.098 0.002024 ** 
MonthNameJan     -4.928e-01  9.093e-02  -5.420 8.26e-08 ***
MonthNameJul     -1.690e-01  9.007e-02  -1.876 0.061055 .  
MonthNameJun     -6.173e-02  8.377e-02  -0.737 0.461408    
MonthNameMar     -1.205e-02  8.055e-02  -0.150 0.881151    
MonthNameMay      2.303e-02  7.819e-02   0.295 0.768388    
MonthNameNov     -3.124e-01  8.198e-02  -3.810 0.000151 ***
MonthNameOct     -1.817e-01  7.698e-02  -2.360 0.018537 *  
MonthNameSep      4.787e-03  7.562e-02   0.063 0.949543    
LocPeace:YearVal  1.029e-01  3.540e-02   2.908 0.003759 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.428 on 691 degrees of freedom
  (1749 observations deleted due to missingness)
Multiple R-squared:  0.8974,	Adjusted R-squared:  0.8938 
F-statistic: 251.8 on 24 and 691 DF,  p-value: < 2.2e-16

 

As I said, I intend to write a verbose blog post in the next few days with graphs and charts that are more decipherable for the general audience, so stay tuned for that.

The photo at the top is CC BY-SA 2.0 from Bike Calgary.

3 thoughts on “How weather affects cycling – methodology

  1. Did you consider a poisson or negative binomial regression? A log-linear transformation is biased for count data because it assumes normality in the error terms, whereas count data does not generally exhibit normal errors.

    Like

    1. The short answer is I didn’t think as much as I should about what distribution to use, and you raise a good point. The longer answer is that if I recall correctly, Poisson tends towards normal when the mean is relatively high, and the means here are high. (It’s ironic I didn’t describe the counts in terms of mean or median, given the blog name!) The least common is the Evening, with a mean of 154 and a median of 107; the other extreme is Peak, where the mean count is 543 and the median 510. Midday has a mean of 200 and a median of 179; Weekend has a mean of 423 and median of 315 in the dataset I used. Only about 1.5% of observations were less than 10, which is where Poisson starts really diverging from normal.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s