Ranglijsten beste prestaties 2015/2016

Discussie in 'Lange baan' gestart door EenBrabander, 9 okt 2015.

  1. There certainly are drawbacks. I should mention that the most important thing for me was to find a better model than the one I used before, which was based on World Cup points.

    In fact I have sort of tested the model, but not in the way that you suggest.

    I used the different skaters ranking points to estimate their chance of winning the next race, for example at a world cup. The model used to estimate those winning chances came quite close to the actual results (winning probability).

    So I think for the moment I will stick to the model. Changing it once again will be a bit more time-consuming than I can afford...

  2. I haven't been able to follow the explanation here, so I have a couple of questions:

    Is this the "top level" of those skaters, or is it some kind of "average" level - and if so, which races are counted in?
    edwin1976 vindt dit leuk.
  3. Forza

    Forza Active Member

    Sure, I'm not asking you to change your model, just making sure that the drawbacks are clear. As I said, in essence I like the idea.

    Testing winning probabilities also seems a good way to test it, I like that. If it's simple to do so, I would still test the winning probabilities against rating differences. Now you only tested the top competitors, but with that test you also check validity for lower to mid-tier competitors. If that test also gives good results, you have more evidence for validity of your model than now.
  4. EenBrabander

    EenBrabander Well-Known Member

    Top level, not average. I've counted World Cup/WCH/OG races, and sometimes a National Championship.

    I'll explain - In English - how Pavel Kulizhnikov's 1:06.70 is corrected (note ø):
    In November 2015, Pavel skated 1:06.70 at the Utah Olympic Oval in Salt Lake City. The track is has a height of 1425 meters, and at that height, the average air pressure is 854,2 hPa. At sea level, that value is 1014 hPa. After height correction, you get 1:08.49. Which means, if this race would be at sea level instead of at 1425m, this race would be 1:08.49. Then air pressure correction. The SLP was quite high: 1028 mbar. No, SLP doesn't mean Sverre Lunde Pedersen here, but Sea Level Pressure. That's the air pressure corrected to sea level. With normal (lower) air pressure, Kulizhnikov would have skated faster. @SprintMaster's model corrects it to 1:08.19, my model to 1:08.12 (note 1). Now some different things kick in, such as track speed correction. Comparing to Heerenveen in season 15/16, SLC is 0.18% slower than Thialf if the two rinks would have been at the same height. (note 2). So I have to multiply 1:08.12 with ((100-0,18)/1000). Then you get 1:08.02. Now we still have to correct for two things: the first one is average season speed. That's how fast is skated in the competitions in that season and some previous seasons, based on values as seen in note 2. Here is this value 1:09.28, which affects Kulizhnikov's time with only 0.02. If it would have been 1:10.28, it would affect the time with half a second. This way, you can correct for things like clap skates, as the average season speed was a lot slower back then as skaters were in fact slower. Last thing: skater-specific factor. This is a correction for how much a sporter likes to skate on highland or lowland. For that, I compare uncorrected PRs of the skater to the 5 fastest time on highland and on lowland ranks. Kulizhnikov hasn't really a preference as he skates extremely fast on both, so his value is so close to 1, that it doesn't affect his time anymore. (This factor is almost never bigger than 0.15%. Track factors are sometimes 2 or 3% on outside tracks.) After all the corrections, his time is 1:08.00.

    Note ø: I cån't spæk Norwegian, thøugh I'd like to be åble to dø that. / Jeg kan ikke snakker Norsk, men jeg vil det reelt sett. (is that correct?)

    Note 1: my model corrects more with higher speeds. If his time was 1:17.70, SprintMaster's model would correct more than mine.

    Note 2: this is based on the biased top-10 average (BTA). The formula defining the BTA is
    BTA=(100*B4+80*B5+70*B6+60*B7+50*B8+45*B9+40*B10+35*B11+30*B12+25*B13)/535
    when the no.1 time is in cell B4, no.2 time in cell B5, etc. I use height correction on the BTA's. When averaging BTA's of multiple competitons on the same track, you can essentially see how fast a track is. By comparing it to other tracks, you can see how much faster or slower track A is than track B. That's how I get my track speed correction factors.
    Laatst bewerkt: 3 jun 2016
  5. This is truly interesting stuff! It feels like I've entered a club of extreme nerds. A few more questions:

    • height corrections and air pressure corrections - are those theoretically calculated? Or empirically? Incidently, in my model a time of 1.06,70 at SLC equals 1.08,65 at Thialf. And if the SLP was 1028 in stead of 1013,25, my models gives an air pressure correction of only 0,15 - to 1.08,50.
    • track speed corrections and average season speeds are based on the top 10 times of the tracks / seasons? But wouldn't these results be affected by the fact that some tracks are used more than others to big competitons?
    • skater-specific factor. So if a skater normally is better at slower tracks, he gets extra rewarded if he for once skates a good race at a fast track? This doesn't seem logical to me.

    My all time ELO lists aren't really published on the Skøyteranking site yet, but it is tempting to give you a sneak view:


    fullt navn land tid bane dato
    K. Nuis NED 68,24 Heerenveen 29.12.15 2241 (1.08,12)
    S. Davis USA 66,91 Calgary 06.12.09 2234 (1.08,15)
    P. Kulizjnikov RUS 68,16 Heerenveen 12.12.15 2230 (1.08,16)
    S. Groothuis NED 68,39 Sotsji 12.02.14 2177 (1.08,36)
    D. Morrison CAN 68,43 Sotsji 12.02.14 2166 (1.08,40)

    Fun to see that I have the same top 5 as you, but only PK in the same position!
    EenBrabander vindt dit leuk.
  6. Nice try... A better version would be: Jeg kan ikke snakke norsk, men jeg skulle gjerne ha kunnet det. Or:

    Jeg ønsker virkelig at jeg hadde forstått norsk. Kanskje jeg kan lære noen norske setninger? / I really would like to understand Norwegian. Perhaps I can learn some phrases?
  7. SprintMaster

    SprintMaster aangepast Medewerker

    Welcome to the club! :cool:

    Air pressure corrections are calculated theoretically. To find the correction, you have to:

    1. Fill in the speed and air density in the power formula (see http://www.blog.ultracycle.net/2010/05/cycling-power-calculations) for the original speed, giving the power in Watt.
    2. This power value must be the result of the power formula when you fill in the average air density. So the variable you have to solve is the speed value. If the speed is constant, it's easy, since we have a cubic eqation. (See https://en.wikipedia.org/wiki/Cubic_function) But, as always, it's not so easy ... Because the acceleration after the start is important, especially on the short distances. Which brings me to the third step:
    3. To calculate the correction I set up a model of the mechanical equations to describe the acceleration, speed and skated distance. A skater accelerates during the first ca. 15 seconds of a race and then remains the speed from that moment to the end of the first full lap. After the first lap, the speed is considered to be lineair with a constant acceleration and depending on the lap times. Filling in the opening time and lap times, I use an iterative method to solve the equations, assuming that the maximum power must be minimized.
    4. This gives some values which I use to solve the equations in the corrected situation. Assuming that the maximum power and initial acceleration stays the same.

    Height corrections are calculated empirically. Because I discovered that times skated on highland rinks were not that faster as I expected using the values find for the air pressure correction. To estimate the correction, I took the five best top 5 average races skated in Calgary and Heerenveen in 2005-2014. I took Calgary as highland rink because the quality of the Calgary ice is constant compared to Salt Lake City. The results I get were somehow surprising: the largest corrections were on the 1000 and 1500m. And they were greater at the ladies distances then the men distances, except the 5000m.

    For example this is how my model corrects the 1000m time skated by Kulizhnikov in Salt Lake City:

    Calculation_expl.jpg

    To play with this, you have to use the following formulas in Excel:
    C4: =M4
    D4: =((1-0.00649*I4/(C4+0.00649*I4+273.15))^5.256)*B4
    E4: =A4*(1+P4)
    F4: =E4*(1+Q4)
    H4: =((1-0.00649*I4/(288.15))^5.256)*G4
    M4: =0.5*(K4+L4)-0.00649*(I4-J4)
    P4: =N4*(H4-D4)/H4
    Q4: =O4*(G4-H4)/G4

    Mmm, not so easy to translate post
    Laatst bewerkt: 3 jun 2016
  8. SprintMaster

    SprintMaster aangepast Medewerker

    The list of best performances on the 1000m since 2001 (all international senior competition races. Some record races skated in other events were added). Without extra corrections for slower ice rinks. All skaters below 1.08.50 and some other best results. First Norwegian I found is Flygind Larsen.

    Lijst_1000m_vanaf2001.jpg .
  9. EenBrabander

    EenBrabander Well-Known Member

    @Skøyteranking
    To add in some confusion, my model is different than @SprintMaster's model. :confused:

    This is my model:
    upload_2016-6-3_21-10-28.png

    E3: corrected time=H3^0,8*L3*R3*(1-((((1000/M3)^4/(120376))^1,1*((((K3-1014)/1014))))))*(M3*((((-J3+1014)/1014)*100*0,17)/100+1))
    L3: sporter specific factor=((S3*I3+3+T3)/4)
    O3: highland sporter factor=(N$2/N3)
    Q3: lowland sporter factor=(P$2/P3)
    R3: season speed value=((1+((69,16+0,075)/S1))/2)
    S3=(O3-Q3)/900
    T3=(Q3/O3)

    With
    H3=rink factor
    I3=rink altitude
    J3=average rink pressure
    K3=sea level pressure
    M3=not yet corrected time
    N3=highland PB
    P3=lowland PB
    N$2=average of five best highland times in the world (that's for 2005/2006 of course a different value than for 2015/2016)
    P$2=same as N$2, but fot lowland


    E3 Formula explanation:
    I multiply the not corrected time with
    (rink factor)^0,8
    originally, the rink factor was too influencing, so I added in that ^0,8
    sporter specific factor
    that's that thing whether a sporter prefers highland or lowland
    season speed value
    how fast usually was skated back then when the time was skated, explained in previous post

    (1-((((original time)^4/(120376))^1,1*((((SLP-1014)/1014))))))
    this is my air pressure correction model. Corrects more at higher speed, and gives imho more plausible results. I'll explain it more clearly if you want, and if I'm able to. It took me a few hours to get to this thing... ;)
    ((((-average rink pressure+1014)/1014)*100*0,17)/100+1))
    same as in @SprintMaster's model, but with 0.17 instead of 0.166.
  10. SprintMaster

    SprintMaster aangepast Medewerker

    It seems to me that the best 5 average lowland cannot be 68.14. I get 66.752 (highland) en 68.348 (lowland) when I take the 5 fastest times.

    @EenBrabander: I'm trying to figure the formulas out with Excel.

    Some values seems to be a bit magic:
    • 69,16 looks like an outdated skated time.
    • what is the meaning of 0.075?
    • where is the air pressure correction factor 0.243?
    Laatst bewerkt: 3 jun 2016
  11. It's understandable. Anyway, this goes far above my head already...!

    But to summarize, it seems to me that your model(s) gives more credit to times from high altitude than mine, which of course is embarassingly simple compared to yours. But just like you, my correction factor for altitude is higher for the 1000 and 1500 m than for the other distances:

    (500) 3,032
    (1000) 3,345
    (1500) 3,797
    (5000) 2,16
    (10 000) 2,011

    (ladies 500) 2,936
    (1000) 3,37
    (1500) 3,17
    (3000) 2,562
    (5000) 1,63
  12. This sounds reasonable. Håvard Bøkko has been the national champion six times, but never really give his full "attention" to the 1000 m.
  13. SprintMaster

    SprintMaster aangepast Medewerker

    For the Olympic season I took the same air pressure correction for all distances. As the season was going on, I saw some unexpected corrected results between races of the same skater. Especially on the 500m for Michel Mulder. His 34.66 race (air pressure 1035hPa) at het World Champions 2012 was corrected 0.10 sec better than his 34.31 (pressure 990hPa) Olympic qualification race. Both skated in Heerenveen. Off course both races were great but in december 2013, the level of the 500m (and also at the other distances) was very high and it seems to me that is was very unlikely that his 34.31 was not his best race.
    So, last year I had to make a model which describes the acceleration of a skater. It took me quite some time to figure out how to solve the equations. But the outcome was great: for the 500m, the correction was much lower then I expected! Michel Mulders corrected OKT race was 0.06 better than his 2012 race. Also, on the 1000m, Groothuis corrected Olympic gold race was better than his 2011 en 2012 races in Heerenveen.
    If I wasn't able to compute the correction, I should have estimated it by comparing that two 500m races and simply suppose that the 34.31 was a bit better.
  14. EenBrabander

    EenBrabander Well-Known Member

    Pavel Kulizhnikov 1:08.10 Stavanger
    Pavel Kulizhnikov 1:08.10 Stavanger
    Kjeld Nuis 1:08.12 Stavanger
    Kjeld Nuis 1:08.13 Stavanger
    Pavel Kulizhnikov 1:08.16 Heerenveen
    1: that's the BTA for Heerenveen for season 15/16 (I've used 11/12 till 15/16 BTAs to calculate that). It's used for determining the track speed factor. I've explained the BTA in a previous post, which I'll quote again
    2: no idea :eek:. Removing that makes all times ±0.04 faster, but doesn't change a lot.
    3: that one is speed dependent.
    (1-((((1000/M3)^4/(120376))^1,1
    The values resulting from this formula are around (1-0.34) as in your old model on which I based mine, the air pressure correction value was 0.34 instead of 0.243.


    BTA explanation incoming!

    So the 1000m men BTA for Heerenveen for 15/16 = (1*(BTAs of all 1000m men competitions in 15/16)+0.75*(BTAs of all 1000m men competitions in 14/15)+0,55*(BTAs of all 1000m men competitions in 13/14)+0,38*(BTAs of all 1000m men competitions in 12/13)+0,25*(BTAs of all 1000m men competitions in 11/12))/(1*(amount of 1000m competitions in 15/16)+0,75*(amount of 1000m competitions in 14/15)+0,55*(amount of 1000m competitions in 13/14)+0,38*(amount of 1000m competitions in 12/13)+0,25*(amount of 1000m competitions in 11/12))
  15. I have accepted your challenge! But I'm a little bit confused (or should I say depressed) about the results:

    graf.png

    I have used rankings as of December 2015, and tested how skaters with 10, 20, 30 etc. points more than certain opposing skaters did against those skaters in races throughout the 2015/16 season. The winning probability was ecpected to follow the blue line, but in fact was observed to be quite a bit higher (orange dots). The number of observations aren't very high for each level - mostly from 20 to 30. But I'm troubled by the facts that it (almost) consistantly lies above the blue line of expected probability.

    This was a test using rankings as of Dec'15. I will do a new test using the newest rankings (March'16), and this time only with 100 points difference. I'll get back to you!
  16. - A sigh of relief!

    At least, the result of this new test was more convincing than the first one.

    94 observation of "duels" where the best skater is ranked 100 points better than the weaker skater (post-season ranking).

    100 points better ranking equals winning probability of 1/(1+10^((-100)/400)) = 0,64. The observed probability of 0,66 is reasonably close, so I guess the system goes on without any corrections next year.
  17. Forza

    Forza Active Member

    Thanks for carrying this out.

    With that last test you check the March ratings against races skated before March. Since the March ratings are highly correlated with those races, the winning probabilities should be almost correct. Clearly, something would be wrong with the model if it is unable to predict the historic observations with which it has been estimated (I expect that the same will also hold for lower differences if you test it like this).
    That by itself does not imply that the model is valid, by the way.

    I do find the bias when predicting future observations, as you showed in your first test, rather troublesome and I'm wondering what causes this. Will the estimates be better if you update the ratings after every race rather than taking the December ratings?
  18. Forza

    Forza Active Member

    That would be awesome. If properly tuned, the ELO-method can be a strong tool for predicting winners. That is something that cannot be done with the corrected times of Sprintmaster and EenBrabander, so it would be a very welcome addition to the set of available tools.
  19. SprintMaster

    SprintMaster aangepast Medewerker

    @Skøyteranking: It took me some time to analyze your model. I also tried to find out how the model of ELO rating can be translated to the statistical model I use to compute the chances of skaters in a competition.

    You have to start somewhere and if you don't want to go into the theoretical background, a logic first step is to assume that corrections are always lineair with the air pressure. The one thing I ignored in my first model was the influence of the temperature on the pressure at a certain heigth. You use it right away.
    Maybe you could give some background information on how you computed the corrections: which years did you use, top 5 results or complete results, only international races or also national races etc.

    ELO model
    The ELO rating model is using a normal distribution. The Dutch wiki page says it's based on a classification of 200 points and the classification is equalized to the standard deviation of the players performance.
    The standard deviation of a match between two players becomes: σ = 200√2, which can be approximated by 2000/7. This means that a difference of 100 points is 0.35 σ.
    When I take two players in my statistical model (normal distribution) with the same σ and give the better player a 0.35σ better average, the chance of winning is 60%. So, it's different from the ELO! I've to make the difference 0.5σ to get the same winning chance (64%) as ELO. Does anyone know why these two situations doesn't match?

    Ranking model analysis - ELO values
    Back to your model: with the formulas per distance and Excel I've computed what times in Heerenveen and SLC match with some ELO values. It seems that the highest values you use for the best times is roughly between 2300 and 2500. The results are listed in the first table. Clearly, the 1000m men, 5000m and 1500m ladies are underrated compared to the other distances. On the other hand, the 10000m men is overrated. In chess, the top players have ELO values above 2700 and some above 2800, so maybe it's an idea to use higher vales?

    Heigth correction
    From the time differences between Heerenveen and SLC, the heigth correction percentage (i.e. per 1% difference in air pressure) can be computed and are shown on the fourth row of each distance. The results for ELO = 2400 are 0.01 till 0.03% higher than the vales in my model. This is shown in the second table in the fourth column. The 10km men and 1500m ladies have the biggest difference. Also can be seen that the corrections decreases by lower ELO values. Although this is correct, the decrease is more than 10 times too much compared to the values provided by using the power formula.

    Model_analyse_per_afstand.jpg

    Model_analyse_afwijking.jpg

    Standard deviation
    From the time differences between the 2300 and 2400 ELO values, the used standard deviation can be computed (6. column of the first table). Values for ladies are higher than for men and are relative larger when the distance increases.
    Comparing these values to the standard deviations of corrected times by my model can be done by taking the top 20 and computing the average values.
    This is shown in the second table. The values are roughly two times the values in my model.

    Formulas
    The structure is: ELO = a + b * pressure + c * time, with a till c being constants.
    The choice of c is based on the standard deviation. Increasing |c| means a greater standard deviation. The choice of b is based on the timecorrection for pressure. However, corrections of distances cannot be compared to each other since the ratio of c between the distances is not lineair to the time. This brings me to the point of chosing the correction time as variable in the formula instead of the uncorrected time. The benefit is that it simplifies the formula to:
    ELO = a + b * corrected time. The corrected time can be written as: corrected_time = time * (1 + ((1013.25 - pressure_at_ice_rink/1013.25)* correction_%).
    For example, the formula for the 500m men would be (in Excel): ELO =26347-A2*695.41*(1+((1013.25-B2)/1013.25)*C3)
    A2 = time
    B2 = pressure ice at rink
    C3 = correction %

    Another benefit is that the correction percentages are equal for all ELO values. Since the speed values doesn't differ too much to have an effect on the correction percentage, it's not necessary to correct for this.
    The final benefit is that the formula is self-explaining. And you only have to worry about chosing the correct value of 'a', since the value of b depends on the standard deviation and is valid for all corrected times.

    Quartet starts
    - the correction for quartetstarts is not used in 5km ladies/10km men. Why?
    - the correction for quartetstarts is about 0.6 sec (=0.25%) for 3km ladies and about 2.7 sec (=0.75%) for 5km men, assuming k=1. Personally, I'd take the same (low) percentage for all quartet starts. Because it avoids discussions about times skated in B group being just a bit better than the winning time in the A group. The choice is hard to make since there is few information to compare with. But I think the advantage would not exceed 1 second every time a skater catch up someone, when circumstances are the same for A and B group.

    Validation of ratings
    To validate the ratings, in https://www.schaakbond.nl/knsb/handboek/rating/Rekenregels KNSB Ratingsysteem.pdf 9.2 is mentioned a recalculation of the end ratings. Anyone who knows how this works? Do they start with the highest ranked player and so on? If the rating is changed, then it affects the rating of the other players.

    Also, in http://www.wikiwand.com/nl/Elo-rating#/De_betrouwbaarheid_van_ratings is a test to validate the ratings.
    Laatst bewerkt: 11 jun 2016

Deel Deze Pagina