Saturday, 6 July 2019

Raspberry Pi Backup Server

Getting Old

Recently I've found myself lying awake at night worrying if my documents, code and photos are backed up and recoverable. Or to put it another way - I've officially become old :-(

With a new Raspberry Pi 4B on order it's time to re-purpose the old Raspberry Pi 3B to create a backup solution.

Hardware

I want my backup solution and backup media to be small, cheap and redundant. Speed isn't really an issue, so I've chosen micro SD as my backup media for this project.

I've picked up an Anker 4-Port USB hub, 2 SanDisk 64 GB micro SD cards and 2 SanDisk MobileMate micro SD card readers. I ordered this kit from Amazon and the prices at the time of writing were:

ComponentPrice
Anker 4-Port USB 3.0 Ultra Slim Data Hub £10.99
SanDisk Ultra 64 GB microSDXC £11.73
SanDisk MobileMate USB 3.0 Reader £7.50

They fit together really well, with room for two more SD cards and readers if I need to expand:

The plan is to make one of the SD cards available over the network as a share, via the Pi using SAMBA. The share can be mapped as a Windows network drive and files can easily be dragged and dropped for backup. In case the first backup SD card fails, the Pi will copy the files and folders from the first SD card to the second SD card using rsync to create a backup of the backup.

Software

Download and upgrade the Pi 3B to the lastest version of Raspbian. I've chosen Rapbian Lite to save a bit of space on the Pi's SD card:

https://downloads.raspberrypi.org/raspbian_lite_latest

At the time of writing the lastest download was: 2019-06-20-raspbian-buster-lite.zip

Write the OS to the Pi's SD card using Etcher. Top tip - Etcher can write a .zip file, but it's much quicker to extract the .iso file from the .zip file and write that instead.

Don't forget to add an empty ssh file to the boot partition on the Pi's SD card if you are going to run the Pi headless.

Put the Pi's SD card into the Pi, attached the USB hub and micro SD cards, and boot the Pi and login via SSH. Update and upgrade any new packages first, enable unattended security updates and install your editor of choice:

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install unattended-upgrades
$ sudo apt-get install vim

Because I've got a Pi 4 on the way, I want to call this Pi 'raspberrypi3'. Modify the /etc/hostname and /etc/hosts files:

$ sudo vim /etc/hostname

raspberrypi3
$ sudo vim /etc/hosts

127.0.1.1       raspberrypi3
$ sudo reboot

At this point, the backup SD cards should be available to Linux as devices /dev/sda and /dev/sdb.

I want the backup SD cards to be readable on Linux and Windows machines using the exFAT file system. A good tutorial on how to do this on Linux using FUSE and gdisk is available here:

https://matthew.komputerwiz.net/2015/12/13/formatting-universal-drive.html

$ sudo apt-get install exfat-fuse exfat-utils
$ sudo apt-get install gdisk

Use gdisk to remove any existing partitions, create a new partition and write this to the SD cards. Make sure to create the new partition as type 0700 (Microsoft basic data) when prompted:

$ sudo gdisk /dev/sda

GPT fdisk (gdisk) version 0.8.8

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries.

Command (? for help):
Command (? for help): o
This option deletes all partitions and creates a new protective MBR.
Proceed? (Y/N): Y
Command (? for help): n
Partition number (1-128, default 1):
First sector (34-16326462, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-16326462, default = 16326462) or {+-}size{KMGTP}:
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 0700
Changed type of partition to 'Microsoft basic data'
Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): Y
OK; writing new GUID partition table (GPT) to /dev/sda.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.

Repeat for the second SD card:

$ sudo gdisk /dev/sdb

Create exFAT partitions on both SD cards and label the partitions PRIMARY and SECONDARY:

$ sudo mkfs.exfat /dev/sda1
$ sudo exfatlabel /dev/sda1 PRIMARY
$ sudo mkfs.exfat /dev/sdb1
$ sudo exfatlabel /dev/sdb1 SECONDARY

Create directories to mount the new partitions on:

$ sudo mkdir -p /media/usb/backup/primary
$ sudo mkdir -p /media/usb/backup/secondary

Modify /etc/fstab to mount the SD cards by partition label. This allows us to mount the correct card regardless of it's device path or UUID:

$ sudo vim /etc/fstab

LABEL=PRIMARY /media/usb/backup/primary exfat defaults 0 0
LABEL=SECONDARY /media/usb/backup/secondary exfat defaults 0 0

Mount the SD cards:

$ sudo mount /media/usb/backup/primary
$ sudo mount /media/usb/backup/secondary

Create a cron job to rsync files from the primary card to the secondary card. The following entry syncs the files every day at 4am:

$ sudo crontab -e

0 4 * * * rsync -av --delete /media/usb/backup/primary/ /media/usb/backup/secondary/

To sync files immediately, rsync can be run from the command line at any time with:

$ sudo rsync -av --delete /media/usb/backup/primary/ /media/usb/backup/secondary/

To make the primary SD card available as a Windows share, install and configure SAMBA:

$ sudo apt-get install samba samba-common-bin
$ sudo vim /etc/samba/smb.conf

[backup]
   comment = Pi backup share
   path = /media/usb/backup/primary
   public = yes
   browseable = yes
   writable = yes
   create mask = 0777
   directory mask = 0777

$ sudo service smbd restart

Finally, install and configure UFW firewall, allowing incoming connections for SSH and SAMBA only:

$ sudo apt-get install ufw
$ sudo ufw default deny incoming
$ sudo ufw default allow outgoing
$ sudo ufw allow ssh
$ sudo ufw allow samba
$ sudo ufw enable

Saturday, 9 March 2019

Card Table

Card Table is a multi-player web based virtual card table implemented using Java, plain JavaScript, WebSockets and Postgres.

Source Code

Code available in GitHub - card-table

Setup

This project requires a minimum of Java 8 JDK to build and a Postgres installation.

A drop/create Postgres SQL script needs to be run to create and initalise the database with default data:
src/main/resources/sql/drop-create-tables.sql

Configure the Java web application's database dev configuration:
src/main/resources/config/dev.properties

Build and Run

Build and run using Maven with an embedded Tomcat:

mvn clean install tomcat7:run-war

Browse to:

http://localhost:8080/cardtable

A new card table will be created with a unique URL. If this project is deployed to a publicly available host, the URL can be shared with other players to play against.

Mouse Controls

Packs of cards can be dragged from the side bar and dropped on the table to create a new deck. Currently there are 2 decks - both standard 52 card decks, one with a black back and one with a red back.

Single cards can be clicked and dragged to move them around the table. Multiple cards can be selected by clicking and dragging the mouse and drawing a selection box around the cards to be selected. Selected cards can be clicked and dragged to move more than one card.

Clicking a single card will turn the card face up/face down. Clicking multiple selected cards will shuffle the selected cards.

Moving cards to the bottom of the table, below the green line, hides them from other players. Any card actions which take place here, e.g. moving, turning and shuffling will not be broadcast to other players.

Dragging single or multiple cards off the screen removes them from the table.

See the video above for examples of all these actions.

Supported Browsers

Currently only desktop browsers are supported due to the lack of native drag-and-drop JavaScript support on mobile devices. At the time of writing, Card Table has been tested on Chrome 72, Firefox 65, Edge 42, IE 11 and Opera 58.

Wednesday, 29 August 2018

Java 9/10 Multiline String

My Java Multiline String project stopped building when compiling with Java 10 because tools.jar has been removed since Java 9.

When the tools.jar dependency is specified like this:

pom.xml

...
<dependencies>
  <dependency>
    <groupId>sun.jdk</groupId>
    <artifactId>tools</artifactId>
    <version>LATEST</version>
    <scope>system</scope>
    <systemPath>${java.home}/../lib/tools.jar</systemPath>
  </dependency>
</dependencies>
...

The build failed with output:

------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 0.347 s
Finished at: 2018-08-29T21:10:41+01:00
Final Memory: 6M/24M
------------------------------------------------------------------------
Failed to execute goal on project multiline-string: Could not resolve dependencies for project org.adrianwalker:multiline-string:jar:0.2.1: Could not find artifact sun.jdk:tools:jar:LATEST at specified path /usr/local/jdk-10.0.1/../lib/tools.jar -> [Help 1]

Simply removing the dependency fixes the build and the project compiles without error. So where are the classes which were in the tools.jar com.sun.tools.javac packages?

In JDK versions 1.8 and lower:

cd /usr/local/jdk1.8.0_172
unzip -l ./lib/tools.jar | grep com/sun/tools/javac/tree/TreeMaker.class
    47366  2018-03-28 21:40   com/sun/tools/javac/tree/TreeMaker.class

In JDK version 10:

cd /usr/local/jdk-10.0.1
unzip -l ./jmods/jdk.compiler.jmod | grep com/sun/tools/javac/tree/TreeMaker.class
warning [./jmods/jdk.compiler.jmod]:  4 extra bytes at beginning or within zipfile
  (attempting to process anyway)
    64266  2018-03-26 18:16   classes/com/sun/tools/javac/tree/TreeMaker.class

I still want to be able to compile this library will all JDK versions from 1.6 onwards without creating another project for versions 9 and 10. To do this we can move the tools.jar dependency to a profile which is only activated for older JDKs:

pom.xml

...
<profiles>
  <profile>
    <activation>
      <jdk>[1.6,9)</jdk>
    </activation>
    <dependencies>
      <dependency>
        <groupId>sun.jdk</groupId>
        <artifactId>tools</artifactId>
        <version>LATEST</version>
        <scope>system</scope>
        <systemPath>${java.home}/../lib/tools.jar</systemPath>
      </dependency>
    </dependencies>
  </profile>
</profiles>
...

The line <jdk>[1.6,9)</jdk> specifies a version range using the Apache Maven Enforcer range syntax. In this case, include all versions from 1.6 upto but not including 9.

Aside from pom.xml changes, the Java code and usage remains identical to the original project.

Java 9/10 module system

This all only works because the maven-compiler-plugin is configured with source and target set to 1.6:

pom.xml

...
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <configuration>
    <source>1.6</source>
    <target>1.6</target>
  </configuration>
</plugin>
...

If we want to use Java 9/10 lanuage features, setting source and target to 10 will give these errors:

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project multiline-string: Compilation failure: Compilation failure:
org/adrianwalker/multilinestring/MultilineProcessor.java:[3,27] package com.sun.tools.javac.model is not visible
(package com.sun.tools.javac.model is declared in module jdk.compiler, which does not export it to the unnamed module)
org/adrianwalker/multilinestring/MultilineProcessor.java:[4,27] package com.sun.tools.javac.processing is not visible
(package com.sun.tools.javac.processing is declared in module jdk.compiler, which does not export it)
org/adrianwalker/multilinestring/MultilineProcessor.java:[5,27] package com.sun.tools.javac.tree is not visible
(package com.sun.tools.javac.tree is declared in module jdk.compiler, which does not export it to the unnamed module)
org/adrianwalker/multilinestring/MultilineProcessor.java:[6,27] package com.sun.tools.javac.tree is not visible
(package com.sun.tools.javac.tree is declared in module jdk.compiler, which does not export it to the unnamed module)

In this case we must correctly use the new Java Module System. To resolve the above errors first we need a module-info.java in the project root specifying a module name and the module's requirements:

module-info.java

module org.adrianwalker.multilinestring {
  requires jdk.compiler;
}

Next we need to export the required packages in the jdk.compiler module and make them visible to our org.adrianwalker.multilinestring module:

pom.xml

...
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <version>3.8.0</version>
  <configuration>
    <source>10</source>
    <target>10</target>
    <compilerArgs>
      <arg>--add-exports</arg>
      <arg>jdk.compiler/com.sun.tools.javac.model=org.adrianwalker.multilinestring</arg>
      <arg>--add-exports</arg>
      <arg>jdk.compiler/com.sun.tools.javac.processing=org.adrianwalker.multilinestring</arg>
      <arg>--add-exports</arg>
      <arg>jdk.compiler/com.sun.tools.javac.tree=org.adrianwalker.multilinestring</arg>
    </compilerArgs>
  </configuration>
</plugin>
...

And now the project should build without errors and work just as before.

Source Code

Monday, 27 August 2018

Enforcing Multi-Tier Architecture

So you've designed an application, using the principals of separation of concerns and a multi-tier architecture. It's a delight to navigate and maintain the code base, the architecture might look something like this:

The presentation layer talks to the application layer, which talks to the data access layer. The facade object provides a high-level interface for API consumers, talking to the service objects, which call objects encapsulating business logic, which operate on data provided by the data access objects. Life is good.

Eventually other programmers will have to maintain and add new features to your application, possibly in your absence. How do you communicate your design intentions to future maintainers? The above diagram, a bit of documentation, and some programming rigour should suffice. Back in the real world, programmers face time pressures which prevent them creating and updating documentation, and managers and customers don't care about code maintainability - they want their features yesterday. When getting the code into production as fast as possible is the only focus, clean code and architecture are soon forgotten.

To quote John Carmack:

"It’s just amazing how many mistakes and how bad programmers can be. Everything that is syntactically legal, that the compiler will accept, will eventually wind up in your code base."

Carmack was talking about the usefulness of static typing here, but the same problem also applies to code architecture: over time, whatever can happen, will happen. Your well designed architecture will risk turning into spaghetti code, with objects calling methods from any layer:

To address this problem I think it would be useful to have a way of documenting and enforcing which objects can invoke a method on another object. In Java this can be achieved with a couple of annotations and some aspect oriented programming. Below is an annotation named CallableFrom which can be used to annotate methods on a class indicating what classes and interface implementations the method can be called from.

CallableFrom.java

package org.adrianwalker.callablefrom;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
import org.adrianwalker.callablefrom.test.TestCaller;

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface CallableFrom {

  CallableFromClass[] value() default {
    @CallableFromClass(TestCaller.class)
  };
}

The annotation's value method returns an array of another annotation CallableFromClass:

CallableFromClass.java

package org.adrianwalker.callablefrom;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.ANNOTATION_TYPE)
@Retention(RetentionPolicy.RUNTIME)
public @interface CallableFromClass {

  Class value();

  boolean subclasses() default true;
}

The annotation's value method returns a Class object - the class (or interface) of an object which is allowed to call the annotated method. The annotation's subclasses method returns a boolean value which flags if subclasses (or interface implementations) are allowed to call the annotated method.

At this point the annotations do nothing, we need a way of enforcing the behaviour specified by the annotations. This can be achieved using an AspectJ aspect class:

CallableFromAspect.java

package org.adrianwalker.callablefrom;

import org.aspectj.lang.JoinPoint;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Before;

@Aspect
public final class CallableFromAspect {

  @Before("@annotation(callableFrom) && call(* *.*(..))")
  public void before(final JoinPoint joinPoint, final CallableFrom callableFrom) throws CallableFromError {

    Class callingClass = joinPoint.getThis().getClass();
    boolean isCallable = isCallable(callableFrom, callingClass);

    if (!isCallable) {
      Class targetClass = joinPoint.getTarget().getClass();
      throw new CallableFromError(targetClass, callingClass);
    }
  }

  private boolean isCallable(final CallableFrom callableFrom, final Class callingClass) {

    boolean callable = false;
    CallableFromClass[] callableFromClasses = callableFrom.value();

    for (CallableFromClass callableFromClass : callableFromClasses) {

      Class clazz = callableFromClass.value();
      boolean subclasses = callableFromClass.subclasses();

      callable = (subclasses && clazz.isAssignableFrom(callingClass))
              || (!subclasses && clazz.equals(callingClass));

      if (callable) {
        break;
      }
    }

    return callable;
  }
}

The aspect intercepts any calls to methods annotated with @CallableFrom, gets the calling object's class and compares it to the class objects specified by the @CallableFromClass's class values. If subclasses is true (the default), the calling class can be a subclass (or implementation) of the class object specified by @CallableFromClass. If subclasses is false the calling class must be equal to the class object specified by @CallableFromClass.

If the above conditions are not met, for any of the @CallableFromClass annotations, the method is not callable from the calling class and a CallableFromError error is thrown. CallableFromError extends Error rather than Exception as it is not expected that application code should ever to attempt to catch it.

CallableFromError.java

package org.adrianwalker.callablefrom;

public final class CallableFromError extends Error {

  private static final String EXCEPTION_MESSAGE = "%s is not callable from %s";

  public CallableFromError(final Class targetClass, final Class callingClass) {

    super(String.format(EXCEPTION_MESSAGE,
            targetClass.getCanonicalName(),
            callingClass.getCanonicalName()));
  }
}

For example, if you have a class named Callable and you only want to be able to call it from another class named CallableCaller, no subclasses:

Callable.java

package org.adrianwalker.callablefrom;

public final class Callable {

  @CallableFrom({
    @CallableFromClass(value=CallableCaller.class, subclasses=false)
  })
  public void doStuff() {

    System.out.println("Callable doing stuff");
  }
}

Another example, if you had some business logic encapsulated in an object which should only be called by a service object and test classes:

UpperCaseBusinessObject.java

package org.adrianwalker.callablefrom.example.application;

import org.adrianwalker.callablefrom.CallableFrom;
import org.adrianwalker.callablefrom.CallableFromClass;
import org.adrianwalker.callablefrom.test.TestCaller;

public final class UpperCaseBusinessObject implements ApplicationLayer {

  @CallableFrom({
    @CallableFromClass(value = MessageService.class, subclasses = false),
    @CallableFromClass(value = TestCaller.class, subclasses = true)
  })
  public String uppercaseMessage(final String message) {

    if (null == message) {
      return null;
    }

    return message.toUpperCase();
  }
}

Testing

To make classes callable from JUnit tests, the unit test class should implement the TestCaller interface. This interface is the default value for the CallableFrom annotation:

CallableFromTest.java

package org.adrianwalker.callablefrom;

import org.adrianwalker.callablefrom.test.TestCaller;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.fail;
import org.junit.Test;

public final class CallableFromTest implements TestCaller {

  @Test
  public void testCallableFromTestCaller() {

    CallableCaller cc = new CallableCaller(new Callable());
    cc.doStuff();
  }

  @Test
  public void testCallableFromError() {

    ErrorCaller er = new ErrorCaller(new CallableCaller(new Callable()));

    try {

      er.doStuff();
      fail("Expected CallableFromError to be thrown");

    } catch (final CallableFromError cfe) {

      String expectedMessage
              = "org.adrianwalker.callablefrom.Callable "
              + "is not callable from "
              + "org.adrianwalker.callablefrom.ErrorCaller";
      String actualMessage = cfe.getMessage();

      assertEquals(expectedMessage, actualMessage);
    }
  }

  @Test
  public void testNotCallableFromSubclass() {

    CallableCallerSubclass ccs = new CallableCallerSubclass(new Callable());

    try {

      ccs.doStuff();
      fail("Expected CallableFromError to be thrown");

    } catch (final CallableFromError cfe) {

      String expectedMessage
              = "org.adrianwalker.callablefrom.Callable "
              + "is not callable from "
              + "org.adrianwalker.callablefrom.CallableCallerSubclass";
      String actualMessage = cfe.getMessage();

      assertEquals(expectedMessage, actualMessage);
    }
  }
}

Where CallableCaller can be called from implementations of TestCaller:

CallableCaller.java

package org.adrianwalker.callablefrom;

import org.adrianwalker.callablefrom.test.TestCaller;

public class CallableCaller {

  private final Callable callable;

  public CallableCaller(final Callable callable) {

    this.callable = callable;
  }

  @CallableFrom({
    @CallableFromClass(value=ErrorCaller.class, subclasses = false),
    @CallableFromClass(value=TestCaller.class, subclasses = true)
  })
  public void doStuff() {

    System.out.println("CallableCaller doing stuff");

    callable.doStuff(); // callable from here
  }
}

Usage

Using the callable-from library in a project requires the aspect to be weaved into your code at build time. Using Apache Maven, this means using the AspectJ plugin and specifying callable-from as a weave dependency:

pom.xml

<build>
  <plugins>
    <plugin>
      <groupId>org.codehaus.mojo</groupId>
      <artifactId>aspectj-maven-plugin</artifactId>
      <version>1.11</version>

      <configuration>
        <complianceLevel>1.8</complianceLevel>
        <weaveDependencies>
          <weaveDependency>  
            <groupId>org.adrianwalker.callablefrom</groupId>
            <artifactId>callable-from</artifactId>
          </weaveDependency>
        </weaveDependencies>
      </configuration>

      <executions>
        <execution>
          <goals>
            <goal>compile</goal>
            <goal>test-compile</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

Overhead

Checking every annotated method introduces significant overhead, I've bench-marked the same code compiled an run with and without the aspect weaved at compile time:

BenchmarkTest.java

package org.adrianwalker.callablefrom.example;

import java.util.Random;
import org.adrianwalker.callablefrom.CallableFrom;
import org.adrianwalker.callablefrom.CallableFromClass;
import org.junit.Test;

public final class BenchmarkTest {

  private static class CallableFromRandomNumberGenerator {

    private static final Random RANDOM = new Random(System.currentTimeMillis());

    @CallableFrom({
      @CallableFromClass(value = BenchmarkTest.class, subclasses = false)
    })
    public int nextInt() {

      return RANDOM.nextInt();
    }
  }

  @Test
  public void testBenchmarkCallableFrom() {

    long elapsed = generateRandomNumbers(1_000_000_000);

    System.out.printf("%s milliseconds\n", elapsed);
  }

  private long generateRandomNumbers(final int n) {

    CallableFromRandomNumberGenerator cfrng = new CallableFromRandomNumberGenerator();

    long start = System.currentTimeMillis();

    for (long i = 0; i < n; i++) {
      cfrng.nextInt();
    }

    long end = System.currentTimeMillis();

    return end - start;
  }
}

Without aspect weaving:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.adrianwalker.callablefrom.example.BenchmarkTest
13075 milliseconds

With aspect weaving:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.adrianwalker.callablefrom.example.BenchmarkTest
81951 milliseconds

13075 milliseconds vs 81951 milliseconds means the above code took 6.3 times longer to execute with @CallableFrom checking enabled. For this reason, if execution speed is important to you, I'd recommend only weaving the aspect for a test build profile and using another build profile, without the AspectJ plugin, for building your release artifacts (see the callable-from-usage project pom.xml for an example).

Conclusions

So is this the worst idea ever in the history of programming? Speed issues aside, it probably is because:

  1. I've never seen a language that offers this sort of method call enforcement as standard.
  2. An object in layer n, called by an object in layer n+1 should ideally contain no knowledge of the layer above it. The code could be changed to compare class object canonical name strings rather than the class object itself, so imports for calling classes are not needed in the callable class - but this creates a maintenance problem as refactoring tools won't automatically change the full class names in the string values and the compiler can't tell you if a class name does not exist.

That said, I still think something like this could help stop the proliferation of spaghetti code.

Source Code

The annotations and aspect code are provided in the callable-from project, with an example usage project similar to the diagram at the start of this post provided in the callable-from-usage project.

Sunday, 22 April 2018

Dynamically Typed Stacks Make Me Nervous

Ten years ago Ted Dziuba wrote Python Makes Me Nervous, I agree with everything he wrote back then - I suppose I'm what Steve Yegge would call a Software Conservative. Ten years on, the static vs dynamic language debate is no closer to being over and now what makes *me* really nervous is entire dynamically typed system stacks.

To be more accurate, what I mean by dynamically typed stacks is: systems built with dynamically typed languages and composed of schema-less services, end-to-end. Let me explain ...

When I was a young programmer, if you wanted to created a web service you used XML-RPC or SOAP. I liked SOAP (yeah, I said it!), with a well defined WSDL and some XSD you knew exactly what your client/server was going to send/receive. You generated client code and server side stub classes with Apache Axis and you got serialisation, de-serialisation, parsing, validation and error handling all for free.

Now everyone uses REST and JSON. Instead of well defined XML services, RESTful web services have to try and shoehorn requests into a HTTP GET/POST/PUT/DELETE method along with some path parameters and/or query parameters and/or request/response headers. Serialisation and validation for RESTful web services are often made an implementation concern of the application with custom serialisation/de-serialisation handlers and bespoke validation code.

I like Relational databases (You heard me!). With a well defined schema you know exactly what data you're going to store and retrieve. Database constraints enforce data correctness and referential integrity and it all gets managed for free in one place.

Now we have schema-less NoSQL databases. These types of data stores are supposedly popular because of their horizontal scalability and fault tolerance across network partitions, but in reality, they are popular because they can be used as a data dumping ground with no need for data modelling, schema design, normalisation/de-normalisation, transaction handling, index design, query plan analysis or need to learn a query language. Data consistency, typing, referential integrity, transactions etc. are all concerns pushed on to the application to implement.

Over the last ten years, knowing fuck all about the data your system operates on until run-time has become trendy.

Enough ranting. Lets look at some code, here's a (contrived) example. Let's say we have an existing Java code base, with a PersonController class for persisting a person's contact details, for use in a contacts list application or something. How do you use this API? Well, the classes method signatures and a good IDE tell you everything you need to know with a minimum of key strokes:

I know I need to pass a Person object to the save method. My IDE will tell me what properties I can set on the Person object. The method throws a checked exception if anything goes wrong, or returns a UUID if the entity is persisted correctly. Awesome, I've got everything I need to use this API in my application, I don't need to care about the implementation details.

Now let's do the same thing with Python:

The save method takes one argument, that's all I know. I'd better go have a look at the code...

class PersonController(object):
    URL = 'http://%s:%s/person'

    def __init__(self, host='localhost', port=8888):
        self.url = self.URL % (host, port)

    def save(self, person):
        data = person if isinstance(person, dict) else person.__dict__
        response = requests.post(self.url, data=json.dumps(data))
        if response.status_code != 201:
            raise ControllerSaveException(response.status_code, response.json()['error'])

        return uuid.UUID(response.json()['id'])

... it makes a REST call. person can be anything that can be serialised to JSON and posted to the /person URL. I'd better go try and find the code for the web service...

class Application(tornado.web.Application):

    def __init__(self):
        handlers = [
            (r'/person/?', Handler)
        ]
        tornado.web.Application.__init__(self, handlers)

    def listen(self, address='localhost', port=8888, **kwargs):
        super(Application, self).listen(port, address, **kwargs)

... it's a Tornado REST web service, lets go check the handler class...

class Handler(tornado.web.RequestHandler):

    def __init__(self, application, request, **kwargs):
        super(Handler, self).__init__(application, request, **kwargs)
        self.publisher = Publisher()

    def set_default_headers(self):
        self.set_header('Content-Type', 'application/json')

    def prepare(self):
        try:
            self.request.arguments.update(json.loads(self.request.body))
        except ValueError:
            self.send_error(400, message='Error parsing JSON')

    def post(self):
        response = json.loads(self.publisher.publish(self.request.body.decode('utf-8')))
        self.set_status(response['status'])
        self.write(json.dumps(response))
        self.flush()

... this tells me nothing about what the person object's JSON representation should contain, WTF is Publisher for. I'd better go find that code and take a look...

class Publisher(object):

    def __init__(self, host='localhost', queue='person'):

        self.connection = pika.BlockingConnection(pika.ConnectionParameters(host=host))
        self.channel = self.connection.channel()
        result = self.channel.queue_declare(exclusive=True)
        self.callback_queue = result.method.queue
        self.channel.basic_consume(self.on_response, no_ack=True, queue=self.callback_queue)
        self.response = None
        self.correlation_id = None
        self.queue = queue

    def on_response(self, channel, method, properties, body):
        if self.correlation_id == properties.correlation_id:
            self.response = body

    def publish(self, data):

        self.correlation_id = str(uuid.uuid4())
        self.channel.basic_publish(exchange='',
                                   routing_key=self.queue,
                                   properties=pika.BasicProperties(
                                       reply_to=self.callback_queue,
                                       correlation_id=self.correlation_id,
                                   ),
                                   body=data)
        while self.response is None:
            self.connection.process_data_events()

        return self.response

... FFS, it publishes the JSON to a RabbitMQ message queue. I'd better go find the code for the possible consumers ...

class Consumer(object):

    def __init__(self, host='localhost', queue='person', bucket='person'):

        self.connection = pika.BlockingConnection(pika.ConnectionParameters(host))
        self.channel = self.connection.channel()
        self.channel.queue_declare(queue=queue)
        self.channel.basic_qos(prefetch_count=1)
        self.channel.basic_consume(self.on_request, queue=queue)
        self.dataStore = datastore.DataStore(bucket)

    def on_request(self, channel, method, properties, body):

        request = json.loads(body)
        errors = self.validate(request)
        if errors:
            response = {
                'status': 400,
                'error': ', '.join(errors)
            }
        else:
            response = self.save(request)

        self.channel.basic_publish(exchange='',
                                   routing_key=properties.reply_to,
                                   properties=pika.BasicProperties(
                                       correlation_id=properties.correlation_id),
                                   body=json.dumps(response))
        self.channel.basic_ack(delivery_tag=method.delivery_tag)

    def consume(self):
        self.channel.start_consuming()

    def validate(self, request):

        errors = []

        if 'first_name' not in request or not request['first_name']:
            errors.append('Invalid or missing first name')

        if 'last_name' not in request or not request['last_name']:
            errors.append('Invalid or missing last name')

        return errors

    def save(self, request):

        id = str(uuid.uuid4())
        try:
            self.dataStore.save(id, request)
            response = {
                'id': id,
                'status': 201,
            }
        except Exception as e:
            response = {
                'status': 500,
                'error': str(e)
            }

        return response

... some bespoke validation code tells me I have to have first_name and last_name keys in my JSON object. Then the object gets saved to the person bucket in a Riak database. But, what else should be in my object? Let's curl an existing record and have a look...

$ curl http://127.0.0.1:10018/riak/person/b8aa0197-89db-4550-9fba-2c0d4b132b67
{"first_name": "Adrian", "last_name": "Walker"}

... and I'm no closer to knowing exactly what should or shouldn't be in a person object.

What a waste of time.

Source Code

Sunday, 1 April 2018

Riak - Building a Development Environment From Source

Building a Riak development environment, like anything involving Linux, is needlessly complicated for no good reason. This method to build from source worked for me from a clean install of Lubuntu 17.10.1:

First, update your package index and install the dependencies and utilities you will need:

$ sudo apt-get update
$ sudo apt-get install build-essential autoconf libncurses5-dev libpam0g-dev openssl libssl-dev fop xsltproc unixodbc-dev git curl

Next, navigate to user home, download kerl and use it to build and install the Basho version of Erlang (WHY?!?!). These steps took a while to complete on my machine, bear with it:

$ cd ~
$ curl -O https://raw.githubusercontent.com/kerl/kerl/master/kerl
$ chmod a+x kerl
$ ./kerl build git git://github.com/basho/otp.git OTP_R16B02_basho10 R16B02-basho10
$ ./kerl install R16B02-basho10 ~/erlang/R16B02-basho10
$ . ~/erlang/R16B02-basho10/activate

With Erlang installed, clone the Riak source repository from GitHub and build:

$ git clone https://github.com/basho/riak.git
$ cd riak
$ make rel

Finally, create 8 separate copies of Riak to use in a cluster:

$ make devrel

Start 3 (or more) Riak instances:

$ dev/dev1/bin/riak start
$ dev/dev2/bin/riak start
$ dev/dev3/bin/riak start

Then join instances 2 and 3 with instance 1 to form a cluster:

$ dev/dev2/bin/riak-admin cluster join dev1@127.0.0.1
$ dev/dev3/bin/riak-admin cluster join dev1@127.0.0.1

Check and commit the cluster plan:

$ dev/dev3/bin/riak-admin cluster plan
$ dev/dev3/bin/riak-admin cluster commit

Monitor the cluster status until all pending changes are complete:

$ dev/dev3/bin/riak-admin cluster status
---- Cluster Status ----
Ring ready: false

+--------------------+------+-------+-----+-------+
|        node        |status| avail |ring |pending|
+--------------------+------+-------+-----+-------+
| (C) dev1@127.0.0.1 |valid |  up   |100.0|  34.4 |
|     dev2@127.0.0.1 |valid |  up   |  0.0|  32.8 |
|     dev3@127.0.0.1 |valid |  up   |  0.0|  32.8 |
+--------------------+------+-------+-----+-------+

$ dev/dev3/bin/riak-admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+------+-------+-----+-------+
|        node        |status| avail |ring |pending|
+--------------------+------+-------+-----+-------+
| (C) dev1@127.0.0.1 |valid |  up   | 34.4|  --   |
|     dev2@127.0.0.1 |valid |  up   | 32.8|  --   |
|     dev3@127.0.0.1 |valid |  up   | 32.8|  --   |
+--------------------+------+-------+-----+-------+

Check the cluster member status:

$ dev/dev3/bin/riak-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      34.4%      --      'dev1@127.0.0.1'
valid      32.8%      --      'dev2@127.0.0.1'
valid      32.8%      --      'dev3@127.0.0.1'
-------------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Congratulations, you have a development Riak cluster. Test the cluster by writing some data to a node:

$ curl -XPUT http://127.0.0.1:10018/riak/test/helloworld -H "Content-type: application/json" --data-binary "Hello World!"

Use a browser to read the data from each node:
http://127.0.0.1:10018/riak/test/helloworld
http://127.0.0.1:10028/riak/test/helloworld
http://127.0.0.1:10038/riak/test/helloworld

Thursday, 8 February 2018

Tell 'em Steve-Dave!

SoundCloud's web interface is rubbish for downloading podcasts, but their API is pretty good, so here's a handy Python script for downloading all of your favourite Tell 'em Steve-Dave! episodes:

import os.path
import re
import requests

API_URL = "http://api.soundcloud.com"
TRACKS_URL = API_URL + "/users/%(USER_ID)s/tracks" \
                       "?client_id=%(CLIENT_ID)s" \
                       "&offset=%(OFFSET)s" \
                       "&limit=%(LIMIT)s" \
                       "&format=json"
DOWNLOAD_URL = "https://api.soundcloud.com/tracks/%(TRACK_ID)s/download?client_id=%(CLIENT_ID)s"
CHUNK_SIZE = 16 * 1024
TESD_USER_ID = "79299245"
CLIENT_ID = "3b6b877942303cb49ff687b6facb0270"
LIMIT = 10
offset = 0

while True:

    url = TRACKS_URL % {
        "USER_ID": TESD_USER_ID,
        "CLIENT_ID": CLIENT_ID,
        "LIMIT": LIMIT,
        "OFFSET": offset
    }

    tracks = requests.get(url).json()

    if not tracks:
        break

    tracks = [(track["id"], track["title"]) for track in tracks]

    for (id, title) in tracks:

        title = str(re.sub('[^A-Za-z0-9]+', '_', title)).strip('_')
        
        url = DOWNLOAD_URL % {"TRACK_ID": id, "CLIENT_ID": CLIENT_ID}

        filename = "%s.mp3" % title
        print "downloading: %s from %s" % (filename, url)

        if os.path.exists(filename):
            continue;

        request = requests.get(url, stream=True)

        with open(filename + ".tmp", 'wb') as fd:
            chunks = request.iter_content(chunk_size=CHUNK_SIZE)
            for chunk in chunks:
                fd.write(chunk)
                
        os.rename(filename + ".tmp", filename)

    offset = offset + LIMIT

4 colors 4 life