Saturday 27 March 2021

National Statistics Postcode Lookup Radius Search With Redis

Of all the questions posed by Plato, the profundity of one stands head and shoulders above the rest:

To answer Plato's question we're going need some geographic information about UK postcodes:

National Statistics Postcode Lookup

This data set is probably the right one for the job. It's from a reliable source, it contains longitude and lattitude for 2.6 million postcodes and best of all - it's free.

The data is downloadable from geoportal.statistics.gov.uk, first item under the 'Postcodes' menu. The dataset appears to be released quarterly every February, May, August and November.

At the time of writing, the latest dowload link points to:

www.arcgis.com/sharing/rest/content/items/7606baba633d4bbca3f2510ab78acf61/data

Interestingly, the domain is www.arcgis.com, the website for a well known commercial Geographic Information System - ArcGIS, from Esri.

Other data sets are available

Code-Point Open

Code-Point Open from Ordnance Survey, free but location information is coded as Eastings and Northings, not ideal for this project.

PostZon

Part of the PAF datasets from Royal Mail, mentioned in the PAF Programmers Guide, longitude and lattitude, but not much information beyond that. Non-free and was apparently leaked by Wikileaks in 2009:

Was the leak of Royal Mail's PostZon database a good or bad thing?

UK Postcodes to Longitudes Latitudes Table

Provided by postcodeaddressfile.co.uk - a Royal Mail reseller. Appears to be a combination of PAF and OS data, has longitude and lattitude data but costs £199 for an Organisation Licence.

Geospatial Index

Redis provides geospatial indexing and a bunch of related commands, awesome - as long as you can provide it with longitude and lattitude data:

Ideal for answering the question "How many postcodes are within a given radius of a given postcode" is the GEORADIUSBYMEMBER command.

Data Load

This bash script downloads the February 2021 release of National Statistics Postcode Lookup ZIP file, unzips the file we need, parses the data and formats into Redis commands which are piped to Redis.

The script uses the csvtool command line utility which will need to be installed if you don't already have it.

load-nspl.sh

#!/bin/bash
# Data URL from: https://geoportal.statistics.gov.uk/datasets/national-statistics-postcode-lookup-february-2021
DATA_URL='https://www.arcgis.com/sharing/rest/content/items/7606baba633d4bbca3f2510ab78acf61/data'
ZIP_FILE='/tmp/nspl.zip'
CSV_FILE='/tmp/nspl.csv'
CSV_REGEX='NSPL.*UK\.csv'
REDIS_KEY='nspl' # NSPL - National Statistics Postcode Lookup
POSTCODE_FIELD=3 # PCDS - Unit postcode variable length version
LAT_FIELD=34 # LAT - Decimal degrees latitude
LONG_FIELD=35 # LONG - Decimal degrees longitude
START_TIME="$(date -u +%s)"

# Download data file if it doesn't exist
if [ -f "$ZIP_FILE" ]
then
    echo "'$ZIP_FILE' exists, skipping download"
else
    echo "Downloading '$ZIP_FILE'"
    wget $DATA_URL -O $ZIP_FILE
fi

# Unzip data if it doesn't exist
if [ -f "$CSV_FILE" ]
then
    echo "'$ZIP_FILE' exists, skipping unzipping"  
else
    echo "Unzipping data to '$CSV_FILE'"
    unzip -p $ZIP_FILE $(unzip -Z1 $ZIP_FILE | grep -E $CSV_REGEX) > $CSV_FILE
fi

# Process data file, create Redis commands, pipe to redis-cli
echo "Processing data file '$CSV_FILE'"
csvtool format "GEOADD $REDIS_KEY %($LONG_FIELD) %($LAT_FIELD) \"%($POSTCODE_FIELD)\"\n" $CSV_FILE \
| redis-cli --pipe

# Done
END_TIME="$(date -u +%s)"
ELAPSED_TIME="$(($END_TIME-$START_TIME))"
MEMBERS=$(echo "zcard nspl" | redis-cli | cut -f 1)
echo "$MEMBERS postcodes loaded"
echo "Elapsed: $ELAPSED_TIME seconds"

Expect output from the script similar to this:

Downloading '/tmp/nspl.zip'
...
196050K ......                                                100% 47.2M=54s
...
Unzipping data to '/tmp/nspl.csv'
Processing data file '/tmp/nspl.csv'
...
ERR invalid longitude,latitude pair 0.000000,99.999999
...
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 23258, replies: 2656252
2632994 postcodes loaded
Elapsed: 18 seconds

Don't worry about the errors:

ERR invalid longitude,latitude pair 0.000000,99.999999

There are about 23,000 entries in the data file with invalid longitude and lattitude values which Redis will reject. The NSPL User Guide (available in the downloaded ZIP file - NSPL User Guide Feb 2021.pdf) has this to say about them:

"Decimal degrees latitude - The postcode coordinates in degrees latitude to six decimal places; 99.999999 for postcodes in the Channel Islands and the Isle of Man, and for postcodes with no grid reference."

and

"Decimal degrees longitude - The postcode coordinates in degrees longitude to six decimal places; 0.000000 for postcodes in the Channel Islands and the Isle of Man, and for postcodes with no grid reference."

Queries

Once we've got a full dataset loaded we can run some queries with redis-cli:

127.0.0.1:6379> geopos nspl "YO24 1AB"
1) 1) "-1.0930296778678894"
   2) "53.95831391882791195"
127.0.0.1:6379> geopos nspl "YO1 7HH"
1) 1) "-1.0816839337348938"
   2) "53.96135558421912037"
127.0.0.1:6379> geodist nspl "YO24 1AB" "YO1 7HH" km
"0.8159"
127.0.0.1:6379> georadiusbymember nspl "YO24 1AB" 100 m WITHDIST
1) 1) "YO24 1AY"
   2) "29.0576"
2) 1) "YO1 6HT"
   2) "2.0045"
3) 1) "YO2 2AY"
   2) "2.0045"
4) 1) "YO24 1AB"
   2) "0.0000"
5) 1) "YO24 1AA"
   2) "69.7119"
127.0.0.1:6379> georadiusbymember nspl "YO1 7HH" 50 m WITHDIST
1) 1) "YO1 2HT"
   2) "32.6545"
2) 1) "YO1 7HT"
   2) "32.6545"
3) 1) "YO1 7HH"
   2) "0.0000"
4) 1) "YO1 2HZ"
   2) "40.3405"
5) 1) "YO1 2HL"
   2) "37.6516"
6) 1) "YO1 7HL"
   2) "38.9421"

REST API

Here's a super basic Flask based REST service to query the geographic index. Postcode, distance and units can be provided as search parameters in the request URL. Postcodes within the requested radius are returned as JSON, along with their distance from the provided postcode.

nspl-rest.py

from flask import Flask, jsonify
from redis import Redis


REDIS_HOST = 'localhost'
REDIS_PORT = 6379
REDIS_DB = 0
REDIS_KEY = 'nspl'

app = Flask(__name__)
r = Redis(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB)


@app.route('/radius/<postcode>/<distance>/<unit>', methods=['GET'])
def radius(postcode, distance, unit):

    try:
        results = r.georadiusbymember(REDIS_KEY,
                                      postcode, distance, unit,
                                      withdist=True)
    except Exception as e:
        results = {}

    return jsonify([{
        'postcode': result[0],
        'distance':result[1]
    } for result in results])


app.run()

API Example Usage

$ curl localhost:5000/radius/YO24%201AB/100/m | json_pp
[
   {
      "distance" : 29.0576,
      "postcode" : "YO24 1AY"
   },
   {
      "distance" : 2.0045,
      "postcode" : "YO1 6HT"
   },
   {
      "distance" : 2.0045,
      "postcode" : "YO2 2AY"
   },
   {
      "distance" : 0,
      "postcode" : "YO24 1AB"
   },
   {
      "distance" : 69.7119,
      "postcode" : "YO24 1AA"
   }
]

Source Code