Author: stephen

Why the R Stats Programming Language is horrible

What did you expect. You gave some hotshot statisticians and data scientists free reign to create a programming language and they seem from my perspective to have made some bad choices.

My main concern is that exceptions are not throw. There is no overseer ensuring you do not get things wrong and you don’t fall out of line. Too much of the honus falls on the developer (data scientist) to ensure he/she is not messing up.

The issues I have:

  • NA (Not Available) and the issues of comparison
  • One-based indexing and refering to elements
  • No return statement


R has what computer scientists would call abool.  Except instead of None they have this missing value called NA. So they have TRUE, FALSE and NA.

They have an apparent smart reason for this because missing values are very important to them, so important in fact, that they have decided to make working with NA a most horrible and onerous never-ending experience.

In `R`, `NA` is used to represent any value that is ‘not available’ or ‘missing’ (in the statistical sense).

Importantly: Any operation involving NA generally yields NA as the result

Any mathematical or logic operation

[1] NA
> NA * 3
[1] NA
> NA == TRUE
[1] NA
[1] NA

Naturally the honus is on us to just know something is wrong as an error is not throw. I can see that they want NA to be different from FALSE because it is important. So important that all other valid values are now apparently less important than the almight NA as any operation done with NA renders that decent value into the horror that is NA.

Maybe I’ll change my mind later but it seems very bad right now.

One Based Indexing

Wow, clearly this thing was not designed with computer scientists in mind. I don’t really mind it except that it lacks consistency and won’t play well with other programming languages.

So R is special.

So special that it need not throw exceptions when you reference vector elements that do not exist or even reference the zero based index of a vector. You know cause data scientists never make mistakes and are naturally never need to be reminded of the craziness of this language.

Take this example:

> my_vect = c(1, 4, 5)
> my_vect
[1] 1 4 5
> my_vect[1]
[1] 1
> my_vect[2]
[1] 4
> my_vect[3]
[1] 5
> my_vect[0]
> my_vect[4]
[1] NA

Not a single exception was thrown that day.

How can we expect to get good results from these data scientists when the language allows you to be error prone.

No warning is even given for accessing a zero indexes element that does not exist…ever.

Again this is just first impressions.

That’s not all folks! Heard about NULL

Yes there is null, ask for dim(my_vector) on a vector.

But low and behold:

> dim(my_vect) == FALSE

There is also an isTRUE() that can be used to see if the value inputted is true. In that can isTRUE(NA) will evaluate to FALSE. Although so will 3.

No Return Statement

Another weird thing where the last evaluated line of a function is what is returned by the function. So how do you return something earlier in the method, probably by using conditionals that I am yet to explore.

Thing To Remember

This language is not here to help you. It is supposed to make things statistically simpler for you but being cautious and rigorous is very much for you to do on your own. You will have no assistance.

That being said…

Creating graphics, graphs, charts and plots with r base charts, lattice and ggplots2 is a great experience.

Setting up a new Macbook for development

A mac is setup for the default user and usually requires a few things to make it into a development machine. It is certain that a fresh ubuntu install is more developer focused than a macbook. In this post I will walk you though what I do when setting up a new macbook for development.

  1. Install homebrew
  2. Install wget
  3. Install python with brew
  4. Download vs code
  5. Download the python extension for vscode
  6. Make tanner terminal the default

Then to improve on all bash (terminal), git and vim prompts etc we are going to use nicolas’s dotfiles, but to use that we have to install xcode from the appstore first. Then you may need to follow this stackoverflow answer if you have previously just installed xcode-command-line tools

Issues with the dotfiles setup

A few issues after installing the dotfiles is:

  • The annoying doink sound when pressing esc and then : in vim
  • The prompt is just showing the current location (it does not show currently logged in user and domain.
  • When moving the cursor or backspacing, if you hold it down for half a second half the characters will be skipped or deleted.

The annoying doink sound when pressing esc and then : in vim

You change this sound in settings -> sound -> alert sound

Cursor Backspace Issue

To fix the cursor and crazy fast backspace issue:

defaults write NSGlobalDomain ApplePressAndHoldEnabled -bool true
defaults write NSGlobalDomain InitialKeyRepeat -int 25
defaults write NSGlobalDomain KeyRepeat -int 6

Changing Prompt colour

Ensure to install the tanner terminal and make sure it is the default terminal theme.


Incorrect Prompt

The prompt is not as I expected

It is showing:


Folders and files with different permissions are not different colours

Try ls to have different colour output

I added an alias to ~/.bash_profile

alias ls='ls -G'

Relative paths are tab-completed to absolute paths


Terminal not exiting

Make the terminal exit when you type exit and avoid this annoying exit that doesn’t actually exit:


You can fix this by changing the terminal settings


Finding a Bloody Decent Django Reporting Tool

I have never liked creating reports and custom tables based on filters for end users. I don’t think any developer does.

It feels like a very generic task and can be achieved (more) easily and with greater agility with the use of SQL. The group_by does not really exist in django. It smartly hides that from the developer and rightly so. It also seems that django was never strong on reporting, and was more reporting on news events…

Unfortunately when a client now asks for totals, group bys and custom filtering…django-tables2 just does not cut it anymore. So as a lazy developer, let us find something that can help us from reinventing the wheel and writing so much damn code.

What do I need out of the reporting package?

  • Automatic filters – filters of related fields, dates and relevant validation should be created automatically and be customisable
  • Fields to show, group by and aggregations should be options when generating the table of report
  • Subtotals and totals should show based on the grouped by options selected
  • Formats of output should be automatic based on django settings and be able to be overriden

But a picture paints 1000 words so here are some screenshots of what I am trying to create:

The main report screen:

Available options that are update dependent fields and uncheck users not linked to specific projects or clients:


The Grouped by report with totals and sub totals:

Django Reporting Packages

These are some of the packages I found. There are some more and a bit more info at

Updated (2018-02-23)

Django Model Report

I thought django-model-report was the white knight but it does not support python3.

It has exactly what I am looking for in a report, check this screenshot (grouped by):

Unfortunately it does not support python 3, so I have to throw it out of the mix.

Django Report Builder

A good offering, works with python3 but lacks the ability to select from a dropdown when filtering. You need the exact id or name (there is no dropdown) or available options.


You can export to xls or csv but there are some issues when translating, displaying and aggregating django native model fields like DurationField

django-report-builder display fields issue


A decent offering but you need to know SQL well and for me the schema was not showing at all. So it was difficult to guess the table and column names.