Month: October 2019

Open-Source Single Sign-On (SSO) and IAM

open-source-single-sign-on-and-IAM

On wikipedia  you can get a list of all SSO platforms / frameworks, you can view the licenses of the products on there. You will see there are so many proprietary solutions and it makes it difficult as they are harder to test out.

We are trying to solve the problem of having one a single user and security systems making maintenance and security easier.

Furthmore, you don't have a community of people looking at the code and finding bugs and security issues with the implementation. I also don't like the model of only giving certain features to paid users - I mean security is ubiquotous and fundamental.

Some of the proprietary players:

  • Gluu
  • Okta
  • Auth0
  • Amazon Cognito
  • Aerobase

Some of the open source projects are old and difficult to test out.

Open Source Project:

However some king soul has created a gist comparing open source Single sign-on and IAM solutions. I have added the table below in case the author decides to delete it:

 AerobaseKeycloakWSO2 Identity ServerGluuCASOpenAMShibboleth IDP
OpenID Connectyesyesyesyesyesyesthird-party
Multi-factor authenticationyesyesyesyesyesyesyes
Admin UIyesyesyesyesyesyesno
Identity brokeringyesyesyes
MiddlewareNGINX, WildflyWildfly, JBOSSWSO2 CarbonJetty, Apache HTTPDany Java app serverany Java app serverJetty, Tomcat
Commercial supportyesnoyesyesthird-partyyesthird-party
Installation Difficultyeasyeasy (docker on openshift)hard
First Release20142008

It is also important to look at the OpenID Certification and ensure the product or project you choose has been certified.

That is important as there are pretty much 2 single sign-on protocols: SAML and OpenID Connect.

For me there are 2 clear winners: Keycloak and WSO2.

Update: Oops I though that Django-oidc-provider was an openid client - but it is not, it is a provider. It is in the same category as Keycloak, WSO2 and Dex. I haven't dug too deep on it - just installed it.

Keycloak

  • Keycloak is a an opensource version of commercial derivative of Red Hat SSO which costs $8000 a year.
  • No patches for the Community Version
  • Users, Roles and Groups
  • User Stores: Single data source
  • Single sign-on: SAML2 and OpenID Connect
  • Fully featured attribute mapping
  • No per-application identity provider
  • Only inbound user provisioning
  • Superuser can manage all realms
  • OTP: Timebased OTP (TOTP), Counter-based OTP (HOTP) and Google Authenticator QR code
  • Multistep Auth: Limited with a set of predefined actions like Update password, terms and condition etc.
  • Easier, userfiendly with modern UI
  • Funcationality more rigid

WSO2 Identity Server

  • Commercial support at 19320 Euros a year.
  • No patches for the community version
  • Users and Roles only
  • User Stores: Mulitple data stores
  • Single sign-on: SAML2 and OpenID Connect
  • Fully featured attribute mapping
  • Has per-application identity provider
  • Inbound and outbound user provisioning with per-applicaiton config
  • Superuser cannot manage all tenants - only tenant admin
  • OTP: SMS, Email, Timebased OTP (TOTP) and Google Authenticator QR code
  • Multistep Auth: More flexible + complexity
  • Harder to install and configure, UI is a bit old
  • Functionality very open

 

Sources

Analysing Response Time Differences from Apache Logs from PHP5.6 to PHP7.3

Recently I added response time to my apache logs, to keep track of how long the server response took.

To do that, in your apache config apache2.conf, add %D:


LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D" vhost_combined
# Added response time %D
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D" combined

Now about 7 days after implementing this I updated the PHP version on the server to php 7.3.

Getting Logs

I copied the logs to a folder with (all those after the 17th of October):

find . -type f -newermt 2019-10-17 | xargs cp -t /home/stephen/response_time/

Now that I have the logs, what I want to do is unzip them and aggregate them all into a single pandas dataframe.

Creating the Dataframe

Just get the damn data it doesn't need to be pretty:

Example of a log entry:


# www.how-to-trade.co.za:443 105.226.233.14 - - [30/Oct/2019:16:27:11 +0200] "GET /feed/history?symbol=IMP&resolution=D&from=1571581631&to=1572445631 HTTP/1.1" 200 709 "https://how-to-trade.co.za/chart/view" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko" 35959

I then setup:


fields = ['host', 'remote_ip', 'A', 'B', 'date_1', 'timezone',
'path', 'RESPONSE_CODE', 'NUMBER', 'unknown', 'USERAGENT', 'RESPONSE_TIME']

df = pd.DataFrame()
for i in range(15):
data = pd.read_csv('how-to-trade-access.log.{0}.gz'.format(i), compression='gzip', sep=' ', header=None, names=fields, na_values=['-'])

time = data.date_1 + data.timezone
time_trimmed = time.map(lambda s: s.strip('[]').split('-')[0].split('+')[0]) # Drop the timezone for simplicity
data['time'] = pd.to_datetime(time_trimmed, format='%d/%b/%Y:%H:%M:%S')

data = data.drop(columns=['date_1', 'timezone', 'A', 'B', 'unknown'])

df = pd.concat([df, data])

So now we have a nice data frame...

apache-logs-data-frame

Basic Data Analysis

So we are just looking at the how-to-trade.co.za site.

Number of rows: 211388

Most frequent IPs

frequent-ips-apache-logs-how-to-trade


Most Common Response Codes

common-response-codes

Interesting is the one 206 Partial Content.

304 Not modified, the 500 internal server errors need to be looked into. The rest are pretty standard.

UserAgents

Checking the top 5 useragents, it seems that semrush is an abuser.

useragents-semrush-abusing

Very important that I add this to my robots.txt

Number of Responses per Day

responses-per-dayIt would be nice if I could put a percentage here and also a day of the week so I can make some calls.

I made the change on 17 October, so I need to drop all rows that don't have a response time.

I then changed to PHP7.3 on 23 October 2019.

I then got the logs today.

So let me get an average of the response times (in milliseconds).

Average Response Time

I get the mean response time and the chart and it doesn't really show a big frop after 17 October 2019

mean-response-time-per-day

Apache gives the response time in microseconds, so 32606 microseconds is 32.61 milliseconds.

PHP5 vs PHP7 Benchmarked

Here is the data on the mean response times:

php5-vs-php7-benchmark

For PHP7.3 the mean response time was: 29.544 ms

For PHP5.6 the mean response time was: 43.456 ms

Error Rates

Looking at the error codes of the response codes and getting the percentage, nothing stands out particularly. An increase in 500 errors that might need to be checked out on PHP7.

php-7-vs-php-5-errorsFinding Slow Pages

Now lets find what pages are really slow and look to remove or change them:

pandas-really-slow-requests

It is clear that the /share/performance page is a problem.

Things I found analysing the logs of Other Sites

Check the bots that are getting your stuff. I found there was a turnitin crawler, that is used to find plaguirism. The thing is - I don't mind if people copy that word for word. I don't want them to get in trouble for it.

Conclusion

So not the most scientifically correct studym but there is enough evidence here to say that:

Response time on PHP7.3 is on average 13.912ms faster, which is a 32.01% increase in response time.

From our tests PHP7.3 is 32.01% faster in terms of page response time

Maximum and median response times also indicate the same trend.

visualise-response-time-change-php7

If you have ideas on improving the data science and analytics of this post I would love that!

Containerising your Django Application into Docker and eventually Kubernetes

There shift to containers is happening, in some places faster than others...

People underestimate the complexity and all the parts involved in making you applciation work.

The Django Example

In the case of Django, we would in the past (traditionally) deployed it on a webserver running:

  • a Webserver (nginx)
  • a python wsgi - web server gateway interface (gunicorn or uwsgi)
  • a Database (sqlite, mySQL or Postgres)
  • Sendmail
  • Maybe some other stuff: redis for cache and user session

So the server would become a snowflake very quickly as it needs to do multiple things and must be configured to communicate with multiple things.

It violates the single responsibility principle.

But, we did understand it that way. Now there is a bit of a mind shift when docker is brought in.

The key principle is:

Be stateless, kill your servers almost every day

Taken from Node Best Practices

So what does that mean for out Django Application?

Well, we have to think differently. Now for each process we are running we need to decide if it is stateless or stateful.

If it is stateful (not ephemeral) then it should be set aside and run in a traditional manner (or run by a cloud provider). In our case the stateful part is luckily only the database. When a say stateful I mean the state needs to persisit...forever. User session, cache and emails do need to work and persist for shorter time periods - it won't be a total disaster if they fail. User's will just need to reauth.

So all the other parts that can all run on containers are:

  • Nginx
  • Gunicorn
  • Sendmail

For simplicity sake I'm going to gloss over redis as cache and user session. I'm also not that keen to include sendmail because it introduces more complexity and another component - namely message queues.

Lets start Containerising our Django Application

Alright so I'm assuming that you know python and django pretty well and have at least deployed a django app into production (the traditional way).

So we have all the code, we just need to get it runnning in a container - locally.

A good resource to use is ruddra's docker-django repo. You can use some of his Dockerfile examples.

First install docker engine

Let's get it running in docker using just a docker file. Create a file called Dockerfile in the root of the project.


# pull official base image - set the exact version of python
FROM python:3.8.0

LABEL maintainer="Your Name <your@email.com>"

# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Install dependencies
RUN pip install --no-cache-dir -U pip

# Set the user to run the project as, do not run as root
RUN useradd --create-home code
WORKDIR /home/code
USER code

COPY path/to/requirements.txt /tmp/
RUN pip install --user --no-cache-dir -r /tmp/requirements.txt

# Copy Project
COPY . /home/code/

# Documentation from person who built the image to person running the container
EXPOSE 8000

CMD python manage.py runserver 0.0.0.0:8000

A reference on the Dockerfile commands

Remember to update the settings of the project so that:

ALLOWED_HOSTS = ['127.0.0.1', '0.0.0.0']

Now let us build the image and run it:


docker build . -t company/project
docker run -p 8000:8000 -i --name project -t company/project --name project

Now everthing should just work!...go to: http://0.0.0.0:8000>/code>