Tuesday, March 6, 2012

Excel 2007 PERCENTRANK is trash.


People and software do not always mean the same thing when they talk about percentiles, percent rank, and so on. Do not expect different software to give the same values. In particular, Excel uses a method that is probably not what you expect and does not correspond to methods implemented in scientific software.

For Excel 2007, in the case of getting a PERCENTRANK for a value that appears in the range, you will actually get (the number of items strictly less than the value) / (the total number of items minus one). This has the nice feature of (at least for distinct values) giving percent ranks that range from zero to one inclusive. It has the nasty feature of almost certainly not being what you thought it was going to be, and not being what you'll get from SAS, R, SPSS, SciPy, etc. (It is, however, mimicked fairly well in other spreadsheet software.)

It isn't immediately obvious how Excel works out the PERCENTRANK for values that don't appear in the range. Some sort of interpolation, certainly - but not one that was easy for me to guess quickly. I'd love to know what the heck it is.

And it isn't just that Excel is non-standard - it also appears to be buggy. Here's one bizarre example I came across of Excel 2007 at work, in which PERCENTRANK is not stable when values are multiplied (or divided) by 100, sometimes giving the same percent rank for different values, sometimes giving different percent rank for the same values. Check out the rows in bold. You should be able to replicate this in Excel 2007 if you like (with nine digits of precision requested from PERCENTRANK).


value PERCENTRANK value/100 PERCENTRANK
96.775 1 0.96775 1
93.6625 0.954545454 0.936625 0.954545454
93.3 0.909090909 0.933 0.909090909
93.0125 0.863636363 0.930125 0.863636363
92.7875 0.772727272 0.927875 0.818181818
92.7875 0.772727272 0.927875 0.772727272
92.475 0.727272727 0.92475 0.727272727
92.0625 0.681818181 0.920625 0.681818181
91.5 0.636363636 0.915 0.636363636
91.275 0.59090909 0.91275 0.59090909
91.0875 0.545454545 0.910875 0.545454545
90.9125 0.5 0.909125 0.5
90.9 0.454545454 0.909 0.454545454
90.8375 0.409090909 0.908375 0.409090909
90.2625 0.363636363 0.902625 0.363636363
89.425 0.318181818 0.89425 0.318181818
89.0625 0.272727272 0.890625 0.272727272
88.4375 0.227272727 0.884375 0.227272727
88.1 0.181818181 0.881 0.181818181
83.325 0.136363636 0.83325 0.136363636
82.3375 0.09090909 0.823375 0.09090909
78.15 0.045454545 0.7815 0.045454545
71.5125 0 0.715125 0


The moral of the story? DON'T USE EXCEL FOR ANYTHING, BUT ESPECIALLY NOT MATH.

No comments: