Why the R Stats Programming Language is horrible

What did you expect. You gave some hotshot statisticians and data scientists free reign to create a programming language and they seem from my perspective to have made some bad choices.

My main concern is that exceptions are not throw. There is no overseer ensuring you do not get things wrong and you don’t fall out of line. Too much of the honus falls on the developer (data scientist) to ensure he/she is not messing up.

The issues I have:

  • NA (Not Available) and the issues of comparison
  • One-based indexing and refering to elements
  • No return statement

NA

R has what computer scientists would call abool.  Except instead of None they have this missing value called NA. So they have TRUE, FALSE and NA.

They have an apparent smart reason for this because missing values are very important to them, so important in fact, that they have decided to make working with NA a most horrible and onerous never-ending experience.

In `R`, `NA` is used to represent any value that is ‘not available’ or ‘missing’ (in the statistical sense).

Importantly: Any operation involving NA generally yields NA as the result

Any mathematical or logic operation


[1] NA
> NA * 3
[1] NA
> NA == TRUE
[1] NA
> NA == FALSE
[1] NA

Naturally the honus is on us to just know something is wrong as an error is not throw. I can see that they want NA to be different from FALSE because it is important. So important that all other valid values are now apparently less important than the almight NA as any operation done with NA renders that decent value into the horror that is NA.

Maybe I’ll change my mind later but it seems very bad right now.

One Based Indexing

Wow, clearly this thing was not designed with computer scientists in mind. I don’t really mind it except that it lacks consistency and won’t play well with other programming languages.

So R is special.

So special that it need not throw exceptions when you reference vector elements that do not exist or even reference the zero based index of a vector. You know cause data scientists never make mistakes and are naturally never need to be reminded of the craziness of this language.

Take this example:


> my_vect = c(1, 4, 5)
> my_vect
[1] 1 4 5
> my_vect[1]
[1] 1
> my_vect[2]
[1] 4
> my_vect[3]
[1] 5
> my_vect[0]
numeric(0)
> my_vect[4]
[1] NA

Not a single exception was thrown that day.

How can we expect to get good results from these data scientists when the language allows you to be error prone.

No warning is even given for accessing a zero indexes element that does not exist…ever.

Again this is just first impressions.

That’s not all folks! Heard about NULL

Yes there is null, ask for dim(my_vector) on a vector.

But low and behold:


> dim(my_vect) == FALSE
logical(0)

There is also an isTRUE() that can be used to see if the value inputted is true. In that can isTRUE(NA) will evaluate to FALSE. Although so will 3.

No Return Statement

Another weird thing where the last evaluated line of a function is what is returned by the function. So how do you return something earlier in the method, probably by using conditionals that I am yet to explore.

Thing To Remember

This language is not here to help you. It is supposed to make things statistically simpler for you but being cautious and rigorous is very much for you to do on your own. You will have no assistance.

That being said…

Creating graphics, graphs, charts and plots with r base charts, lattice and ggplots2 is a great experience.