Displaying a network chart graph diagram on a Django site

There is often a case for improving the understanding of things with visualisations.
Python provides a number of libraries to create great visualisations - however they often are focused more on a data science approach - scripts and jupyter notebooks.

What we want is the visualisations to be easily accessible through a django website, that is what I will be showing in this post with a specific focus on network diagrams.

What Visualisation Package are we using for the Network Graph

I have tried to find a few packages for creating a network graph:

Compararison

I will be using graphviz and networkx for very simple rudementary network graphs. You can make them look good but it is strenuous effort.

I will let you try networkx with plotly and igraph on your own.

Graphviz

Graphviz was very simple output and input. Not too much fuss and can render in many formats. To create a simple graph displayed on the frontend as an SVG:

from graphviz import Graph

g = Graph(
    'G',
    format='svg',
    engine='twopi',
)

g.node('root', shape='rectangle', width='1.5')
g.node('red')
g.node('blue')

g.edge('root', 'red', label='to_red')
g.edge('root', 'blue', label='to_blue')

context_data['my_chart'] = g.pipe().decode('utf-8')

Display on frontend:

{{ my_chart | safe }}

The image output:

graphviz-simple-radial-django

NetworkX

NetworkX is not primarily a graph drawing package but basic drawing with Matplotlib as well as an interface to use the open source Graphviz software package are included.

The networkx plot is drawn using matplotlib (it can also use graphviz) to draw.

    import matplotlib.pyplot as plt
    import networkx as nx
    import io

    G = nx.Graph()

    # rectanle width 1.5
    G.add_node('root')
    G.add_node('red')
    G.add_node('blue')

    # label: to_red
    G.add_edge('root', 'red')
    # label: to_blue
    G.add_edge('root', 'blue')

    nx.draw(G)
    buf = io.BytesIO()
    plt.savefig(buf, format='svg', bbox_inches='tight')
    image_bytes = buf.getvalue().decode('utf-8')
    buf.close()
    plt.close()

    context_data['my_chart'] = image_bytes

Display on frontend:

{{ my_chart | safe }}

The image output:

networkx-simple-django-network

No annotations...

Make sure you close the plot otherwise it will cause issues, matplotlib is not thread safe...

Important:

  • nx.draw(G) - draws with no labels
  • nx.draw_networkx(G) - Draws with labels

networkx-with-labels

Install Lightweight Bitcoin Core on Ubuntu

  1. Get a dedicated piece of generic hardware like a raspberry pi or old laptop or tower. Use it solely for bitcoin purposes.

  2. Install ubuntu (after verifying its SHA256 HASH)

  3. Download bitcoin core

    cd /opt
    sudo wget https://bitcoin.org/bin/bitcoin-core-0.20.1/bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
    # verify
    sudo wget https://bitcoin.org/bin/bitcoin-core-0.20.1/SHA256SUMS.asc
    sha256sum bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
    cat SHA256SUMS.asc 
    # Ensure it matches
    
    # Install
    sudo install -m 0755 -o root -g root -t /usr/local/bin bitcoin-0.20.1/bin/*

Source

Using FuzzyFuzzy to Match similar strings and Making Tweaks to Improve it

I have been scraping betting odds from a few websites in my spare time to decrease the time manually checking the odds on the different sites.

I've been using Fuzzywuzzy

Example

I scrape an event name from a specific site:

name = Feyenoord Rotterdam - Wolfsberger AC

Now I want to match it with an existing event in the database.

I get a list of potencial events based on the sport and type - europa league football.

I get a list of options:

event_names = ['Ac Milan - Sparta Praha', 'Aek Athen - Leicester', 'Crvena Zvezd - Slovan Libe',
'Cska Moscow - Din Zagreb', 'Feyen Rotte - Wolfsberger', 'FK Qarabag - CF Villarreal',
'Gent - 1899 Hoffenh', 'Karabakh A - Villarreal', 'Lask Linz - Ludogo Raz', 'LASK Linz - PFC Ludogorets',
'Lille - Celtic', 'R Antwerp - Tottenham', 'Red Star Belgrade - FC Slovan Liberec',
'Sivasspor - M Tel Aviv', 'Zorya Lugan - Braga']

Then I extractOne:

match, level = process.extractOne(name, event_names)

The problem is it picks the incorrect option:

('Ac Milan - Sparta Praha', 86)

where it should choose:

Feyen Rotte - Wolfsberger

Fuzz Ratio vs Partial Ratio

fuzz.ratio() works well with short and long strings but not with string labels with 3 or 4 labels - which exactly the type of matching we need.

Here are a few tests done:

fuzz.ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Ac Milan - Sparta Praha')
24

fuzz.ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Feyen Rotte - Wolfsberger')
82

fuzz.partial_ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Ac Milan - Sparta Praha')
26

fuzz.partial_ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Feyen Rotte - Wolfsberger')
80

Looks like ratio works better than partial in this case - why was extractOne giving bad resutls?

When ordering is an issue the token_sort_ratio method is used. Not really an issue as the home team is usally stated first in all cases when it comes to sport.

fuzz.token_sort_ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Ac Milan - Sparta Praha')
36

fuzz.token_sort_ratio('Feyenoord Rotterdam - Wolfsberger AC', 'Feyen Rotte - Wolfsberger')
81

Specifying the Scorer

Apparently the process.extract() and process.extractOne() methods let you specify a scorer.
The scorer I was using must not have defaulted to the one I needed:

# Default Extract
process.extract(name, all_names)
[('Ac Milan - Sparta Praha', 86),
 ('Feyen Rotte - Wolfsberger', 82),
 ('Red Star Belgrade - FC Slovan Liberec', 47),
 ('R Antwerp - Tottenham', 45),
 ('Zorya Lugan - Braga', 40)]

# Fuzz Ratio scorer
process.extract(name, all_names, scorer=fuzz.ratio)
[('Feyen Rotte - Wolfsberger', 82),
 ('Red Star Belgrade - FC Slovan Liberec', 47),
 ('Zorya Lugan - Braga', 40),
 ('Aek Athen - Leicester', 39),
 ('Crvena Zvezd - Slovan Libe', 39)]

# Partial Ratio scorer
process.extract(name, all_names, scorer=fuzz.partial_ratio)
[('Feyen Rotte - Wolfsberger', 80),
 ('Red Star Belgrade - FC Slovan Liberec', 47),
 ('Crvena Zvezd - Slovan Libe', 46),
 ('Aek Athen - Leicester', 43),
 ('R Antwerp - Tottenham', 43)]

process.extract(name, all_names, scorer=fuzz.token_sort_ratio)
[('Feyen Rotte - Wolfsberger', 81),
 ('R Antwerp - Tottenham', 42),
 ('Aek Athen - Leicester', 38),
 ('Red Star Belgrade - FC Slovan Liberec', 38),
 ('Ac Milan - Sparta Praha', 36)]

So it is clear I should set the scorer to fuzz.ratio():

process.extractOne(name, all_names, scorer=fuzz.ratio)
('Feyen Rotte - Wolfsberger', 82)

It is also worth noting that you probably want the threshold set to above 80

What is the default scorer?

If we look in the library process.py:

default_scorer = fuzz.WRatio

The w stands for weighted and this is the decription of the function:

# w is for weighted
def WRatio(s1, s2, force_ascii=True, full_process=True):
    """
    Return a measure of the sequences' similarity between 0 and 100, using different algorithms.

    **Steps in the order they occur**

    #. Run full_process from utils on both strings
    #. Short circuit if this makes either string empty
    #. Take the ratio of the two processed strings (fuzz.ratio)
    #. Run checks to compare the length of the strings
        * If one of the strings is more than 1.5 times as long as the other
          use partial_ratio comparisons - scale partial results by 0.9
          (this makes sure only full results can return 100)
        * If one of the strings is over 8 times as long as the other
          instead scale by 0.6

    #. Run the other ratio functions
        * if using partial ratio functions call partial_ratio,
          partial_token_sort_ratio and partial_token_set_ratio
          scale all of these by the ratio based on length
        * otherwise call token_sort_ratio and token_set_ratio
        * all token based comparisons are scaled by 0.95
          (on top of any partial scalars)

    #. Take the highest value from these results
       round it and return it as an integer.

    :param s1:
    :param s2:
    :param force_ascii: Allow only ascii characters
    :type force_ascii: bool
    :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
    :return:
    """

Hope this post helps you...

Oh also another library I found that may make your life easier so you don't even have to use fuzzywuzzy is recordlinker. That takes two seperate data sources and links them together...still need to check that out

Sources