Saturday 4 October 2014

Continued Fraction Database File System

After reading Joe Celko's SQL for Smarties on representing hierarchies in SQL databases, I wanted to have a go at creating a database backed file system.

Representing a file system using the Adjacency List Model detailed in chapter 2 is probably good enough for this sort of task, but that's a bit dull, and the Nested Set Model from chapter 4 is a cool way of looking at trees which I would have never have thought of.

Almost immediately the Nested Set model starts to look like a bad idea for a file system due to the performance implications of inserting into the middle of the tree.

Chapter 5 introduces the use of rational numbers and the Nested Interval Model from Vadim Tropashko, and at this point I quickly began not to care.

Then I came across this short but sweet report by Dan Hazel, Using rational numbers to key nested sets.

The paper quickly covers the Nested Set Model as described by Celko and touches on the similarity of the method presented to that of an encoding by Tropashko.

The paper describes a method of using rational numbers as node keys, and a way of calculating their numerators and denominators using the node positions in the tree represented as a continued fraction.

I wasn't taught continued fractions at school, so I had to go brush up before understanding the paper. It's actually a very interesting topic and there are a few good YouTube videos which can bring you up to speed. I liked this one.

I wont describe Dan Hazels report any further as I can't do it justice – it's very good, go read it now, and come back for a look at my implementation using Java and PostgreSQL.

Welcome back. First I needed a way of converting from a continued fraction node path into a fraction numerator and denominator.

  
public static int[] fraction(final int[] c) {

  int[] f = F[0];

  for (int i = c.length - 1; i >= 0; i--) {
    f = add(F[1], invert(f));
    f = add(fraction(c[i], 1), invert(f));
  }

  return f;
}

This function takes an array of integers which represent the continued fraction node path and returns a fraction as a 2 element integer array, containing the fractions numerator and denominator.

This operation is reversible and the function below takes a fraction as an integer array numerator and denominator and returns a continued fraction.

  
public static int[] continued(final int[] f) {

  int[] a = f;

  List<Integer> c = new ArrayList<>();
  while (a[0] > 0) {

    int i = a[0] / a[1];
    c.add(i);

    a = invert(subtract(a, fraction(i, 1)));
    a = invert(subtract(a, F[1]));
  }

  return toArray(c);
}

These are the two most important functions in a class which contains other operations required to manipulate fractions.

Fraction.java

package org.adrianwalker.continuedfractions;

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.ArrayList;
import java.util.List;

public final class Fraction {

  private static final int[][] F = {
    {0, 1},
    {1, 1}
  };
  private static final int PRECISION = 16;

  private Fraction() {
  }

  public static int[] fraction(final int n, final int d) {

    return new int[]{n, d};
  }

  public static int[] fraction(final int[] pf, final int c, final int[] spf) {

    return fraction(pf[0] + c * spf[0], pf[1] + c * spf[1]);
  }

  public static int[] fraction(final int[] c) {

    int[] f = F[0];

    for (int i = c.length - 1; i >= 0; i--) {
      f = add(F[1], invert(f));
      f = add(fraction(c[i], 1), invert(f));
    }

    return f;
  }

  public static int[] continued(final int[] f) {

    int[] a = f;

    List<Integer> c = new ArrayList<>();
    while (a[0] > 0) {

      int i = a[0] / a[1];
      c.add(i);

      a = invert(subtract(a, fraction(i, 1)));
      a = invert(subtract(a, F[1]));
    }

    return toArray(c);
  }

  public static BigDecimal decimal(final int[] f) {

    return BigDecimal.valueOf(f[0]).divide(BigDecimal.valueOf(f[1]), PRECISION, RoundingMode.HALF_DOWN);
  }

  public static int[] add(final int[] f1, final int[] f2) {

    return fraction((f1[0] * f2[1]) + (f2[0] * f1[1]), f2[1] * f1[1]);
  }

  public static int[] subtract(final int[] f1, final int[] f2) {

    return fraction((f1[0] * f2[1]) - (f2[0] * f1[1]), f1[1] * f2[1]);
  }

  public static int[] invert(final int[] f) {

    return fraction(f[1], f[0]);
  }

  private static int[] toArray(final List<Integer> l) {

    int[] a = new int[l.size()];

    for (int i = 0; i < a.length; i++) {
      a[i] = l.get(i);
    }

    return a;
  }
}

Next I needed a class to manipulate Matrices and perform the sub tree moving calculations as detailed in the paper.

Matrix.java

package org.adrianwalker.continuedfractions;

public final class Matrix {

  private Matrix() {
  }

  public static int[][] matrix(
          final int M00, final int M01,
          final int M10, final int M11) {

    return new int[][]{{M00, M01}, {M10, M11}};
  }

  public static int[][] multiply(final int[][] M1, final int[][] M2) {

    return matrix(
            M1[0][0] * M2[0][0] + M1[0][1] * M2[1][0],
            M1[0][0] * M2[0][1] + M1[0][1] * M2[1][1],
            M1[1][0] * M2[0][0] + M1[1][1] * M2[1][0],
            M1[1][0] * M2[0][1] + M1[1][1] * M2[1][1]
    );
  }

  public static int[][] invert(final int[][] M) {

    return matrix(-M[1][1], M[0][1], M[1][0], -M[0][0]);
  }

  public static int[][] moveSubtree(final int[][] p0, final int m, final int[][] p1, int n, final int[][] M) {

    return multiply(multiply(multiply(p1, matrix(1, 0, m - n, 1)), invert(p0)), M);
  }
}

With the basic calculations nailed, next up is the database. The schema is a single table which uses the real value of the rational representation of the node as the primary key (id). Along with the primary key, the fractions numerator and denominator are stored (nv and dv), then the id and numerator and dominator of the nodes next largest sibling in the tree.

The level of the node in the tree is also stored, simply to help limit the depth of tree queries later on. And also the name of the node – what will be the file or directory name is stored in the row, along with a file content oid.

I've chosen PostgreSQL for this implementation - it provides an in row BYTEA type for storing binary data, or an off table Binary Large Object storage mechanism, with data accessible by an OID key. I've gone with the OIDs as they provide better support for IO streams, but the code could be easily modified to support BYTEA columns, or any other BLOB storage mechanism you might want to use in your RDBMS implementation.

files.sql

CREATE TABLE files
(
  id numeric NOT NULL,
  nv integer NOT NULL,
  dv integer NOT NULL,
  sid numeric NOT NULL,
  snv integer NOT NULL,
  sdv integer NOT NULL,
  level integer NOT NULL,
  name character varying NOT NULL,
  content oid NOT NULL,
  CONSTRAINT files_pkey PRIMARY KEY (id)
);

CREATE INDEX files_name_idx ON files (name);
CREATE INDEX files_level_idx ON files (level);

Next the Java database access code to perform basic operations on the database. This module uses continued fraction paths as inputs to almost all the methods, and is intended for lower level access to files held in the database.

FilesDAO.java

package org.adrianwalker.continuedfractions.filesystem;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import static org.adrianwalker.continuedfractions.Fraction.decimal;
import static org.adrianwalker.continuedfractions.Fraction.fraction;
import static org.adrianwalker.continuedfractions.Matrix.matrix;
import static org.adrianwalker.continuedfractions.Matrix.moveSubtree;
import static org.adrianwalker.continuedfractions.filesystem.Path.parent;
import static org.adrianwalker.continuedfractions.filesystem.Path.sibling;
import org.postgresql.largeobject.LargeObject;
import org.postgresql.largeobject.LargeObjectManager;

public final class FilesDAO {

  private static final String WRITE
          = "insert into files (id, nv, dv, sid, snv, sdv, level, name, content) "
          + "values(?, ?, ?, ?, ?, ?, ?, ?, ?)";
  private static final String READ
          = "select id, nv, dv, sid, snv, sdv, level, name, content "
          + "from files "
          + "where id = ?";
  private static final String TREE
          = "select id, nv, dv, sid, snv, sdv, level, name, content "
          + "from files "
          + "where id >= ? "
          + "and id < (select sid from files where id = ?)";
  private static final String CHILDREN
          = TREE
          + " and level = ?";
  private static final String CHILD
          = CHILDREN
          + " and name = ?";
  private static final String LAST_CHILD
          = CHILDREN
          + " order by id desc "
          + "limit 1";
  private static final String RENAME
          = "update files "
          + "set name = ? "
          + "where id = ?";
  private static final String MOVE
          = "update files "
          + "set id = ?, nv = ?, dv = ?, sid = ?, snv = ?, sdv = ?, level = ? "
          + "where id = ?";
  private static final String REMOVE
          = "delete from files "
          + "where id >= ? "
          + "and id < (select sid from files where id = ?)";
  private static final String CLEAR = "delete from files";
  private static final String ORDER_BY_ID = " order by id";
  private static final String ORDER_BY_NAME = " order by name";

  private final Connection connection;
  private final LargeObjectManager lom;

  public FilesDAO(final Connection connection) throws SQLException {

    this.connection = connection;
    this.lom = ((org.postgresql.PGConnection) connection).getLargeObjectAPI();
  }

  private PreparedStatement prepareStatement(final String sql) throws SQLException {

    return connection.prepareStatement(sql);
  }

  public int clear() throws SQLException {

    PreparedStatement remove = prepareStatement(CLEAR);

    return remove.executeUpdate();
  }

  public File write(final String filename, final int... path) throws SQLException {

    int[] f = fraction(path);
    int[] sf = fraction(sibling(path));
    int level = path.length;

    BigDecimal id = decimal(f);
    BigDecimal sid = decimal(sf);
    long content = lom.createLO();

    PreparedStatement write = prepareStatement(WRITE);
    write.setBigDecimal(1, id);
    write.setInt(2, f[0]);
    write.setInt(3, f[1]);
    write.setBigDecimal(4, sid);
    write.setInt(5, sf[0]);
    write.setInt(6, sf[1]);
    write.setInt(7, level);
    write.setString(8, filename);
    write.setLong(9, content);

    write.executeUpdate();

    return new File(id, f[0], f[1], sid, sf[0], sf[1], level, filename, content);
  }

  public File read(final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement read = prepareStatement(READ);
    read.setBigDecimal(1, id);

    ResultSet rs = read.executeQuery();

    return toFile(rs);
  }

  public File[] children(final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement children = prepareStatement(CHILDREN + ORDER_BY_NAME);
    children.setBigDecimal(1, id);
    children.setBigDecimal(2, id);
    children.setInt(3, path.length + 1);

    return toFiles(children.executeQuery());
  }

  public File child(final String name, final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement child = prepareStatement(CHILD);
    child.setBigDecimal(1, id);
    child.setBigDecimal(2, id);
    child.setInt(3, path.length + 1);
    child.setString(4, name);

    return toFile(child.executeQuery());
  }

  public File lastChild(final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement lastChild = prepareStatement(LAST_CHILD);
    lastChild.setBigDecimal(1, id);
    lastChild.setBigDecimal(2, id);
    lastChild.setInt(3, path.length + 1);

    return toFile(lastChild.executeQuery());
  }

  public File[] tree(final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement tree = prepareStatement(TREE + ORDER_BY_ID);
    tree.setBigDecimal(1, id);
    tree.setBigDecimal(2, id);

    return toFiles(tree.executeQuery());
  }

  public int rename(final String filename, final int... path) throws SQLException {

    BigDecimal id = decimal(fraction(path));

    PreparedStatement rename = prepareStatement(RENAME);
    rename.setString(1, filename);
    rename.setBigDecimal(2, id);

    return rename.executeUpdate();
  }

  public int remove(final int... path) throws SQLException {

    for (File file : tree(path)) {
      lom.delete(file.getContent());
    }

    BigDecimal id = decimal(fraction(path));

    PreparedStatement remove = prepareStatement(REMOVE);
    remove.setBigDecimal(1, id);
    remove.setBigDecimal(2, id);

    return remove.executeUpdate();
  }

  public int[] move(final int[] from, final int[] to) throws SQLException, IOException {

    return moveCopy(MOVE, from, to);
  }

  public int[] copy(final int[] from, final int[] to) throws SQLException, IOException {

    return moveCopy(WRITE, from, to);
  }

  private int[] moveCopy(final String sql, final int[] from, final int[] to) throws SQLException, IOException {

    int[] p = parent(from);
    int[] pf0 = fraction(p);
    int[] psf0 = fraction(sibling(p));
    int[] pf1 = fraction(to);
    int[] psf1 = fraction(sibling(to));

    int m = 1;
    File lc = lastChild(to);
    if (null != lc) {
      m = (lc.getSnv() - pf1[0]) / psf1[0];
    }

    int n = from[from.length - 1];

    int[][] p0 = matrix(pf0[0], psf0[0], pf0[1], psf0[1]);
    int[][] p1 = matrix(pf1[0], psf1[0], pf1[1], psf1[1]);

    PreparedStatement move = prepareStatement(sql);

    for (File file : tree(from)) {

      int[][] M0 = matrix(file.getNv(), file.getSnv(), file.getDv(), file.getSdv());
      int[][] M1 = moveSubtree(p0, m, p1, n, M0);

      int[] f = fraction(M1[0][0], M1[1][0]);
      int[] sf = fraction(M1[0][1], M1[1][1]);
      BigDecimal id = decimal(f);
      BigDecimal sid = decimal(sf);
      int level = to.length + (file.getLevel() - p.length);

      move.setBigDecimal(1, id);
      move.setInt(2, f[0]);
      move.setInt(3, f[1]);
      move.setBigDecimal(4, sid);
      move.setInt(5, sf[0]);
      move.setInt(6, sf[1]);
      move.setInt(7, level);

      switch (sql) {

        case MOVE:
          move.setBigDecimal(8, file.getId());
          break;

        case WRITE:
          long oid = lom.createLO();

          move.setString(8, file.getName());
          move.setLong(9, oid);

          Stream.copy(getInputStream(file.getContent()), getOutputStream(oid));

          break;
      }

      move.addBatch();
    }

    return move.executeBatch();
  }

  public InputStream getInputStream(final long oid) throws SQLException {

    LargeObject obj = lom.open(oid, LargeObjectManager.READ);

    return obj.getInputStream();
  }

  public OutputStream getOutputStream(final long oid) throws SQLException {

    LargeObject obj = lom.open(oid, LargeObjectManager.WRITE);

    return obj.getOutputStream();
  }

  private File toFile(final ResultSet rs) throws SQLException {

    File file = null;

    if (rs.next()) {
      file = new File(rs.getBigDecimal(1), rs.getInt(2), rs.getInt(3),
              rs.getBigDecimal(4), rs.getInt(5), rs.getInt(6),
              rs.getInt(7), rs.getString(8), rs.getLong(9));
    }

    return file;
  }

  private File[] toFiles(final ResultSet rs) throws SQLException {

    List<File> l = new ArrayList<>();

    while (rs.next()) {
      l.add(new File(rs.getBigDecimal(1), rs.getInt(2), rs.getInt(3),
              rs.getBigDecimal(4), rs.getInt(5), rs.getInt(6),
              rs.getInt(7), rs.getString(8), rs.getLong(9)));
    }

    return l.toArray(new File[l.size()]);
  }
}

And finally a higher level API for dealing directly with file names and paths in a manner similar to how you might use them on a command line.

FileSystem.java

package org.adrianwalker.continuedfractions.filesystem;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.sql.SQLException;
import static org.adrianwalker.continuedfractions.filesystem.Path.range;

public final class FileSystem {

  private static final String SEPERATOR = "/";
  private static final String EMPTY_STRING = "";
  private final FilesDAO dao;
  private final int[] rootPath;

  public FileSystem(final FilesDAO dao, final int... rootPath) throws FileSystemException {

    this.dao = dao;
    this.rootPath = rootPath;

    try {
      if (null == dao.read(rootPath)) {
        dao.write("", rootPath);
      }
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }
  }

  public int[] create(final String path) throws FileSystemException {

    try {
      return path(path, true);
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }
  }

  public File[] list(final String path) throws FileSystemException {

    try {
      return dao.children(path(path, false));
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }
  }

  public File[] tree(final String path) throws FileSystemException {

    try {
      return dao.tree(path(path, false));
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }
  }

  public void write(final String path, final String text) throws FileSystemException {

    OutputStream out = getOutputStream(path);

    try {
      Stream.fromString(text, out);
    } catch (final IOException ioe) {
      throw new FileSystemException(ioe);
    }
  }

  public OutputStream getOutputStream(final String path) throws FileSystemException {

    OutputStream out;

    try {
      out = dao.getOutputStream(dao.read(path(path, true)).getContent());
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }

    return out;
  }

  public String read(final String path) throws FileSystemException {

    InputStream in = getInputStream(path);

    if (null == in) {
      return EMPTY_STRING;
    }

    try {
      return Stream.toString(in);
    } catch (final IOException ioe) {
      throw new FileSystemException(ioe);
    }
  }

  public InputStream getInputStream(final String path) throws FileSystemException {

    InputStream in;

    try {
      in = dao.getInputStream(dao.read(path(path, false)).getContent());
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }

    return in;
  }

  public void delete(final String path) throws FileSystemException {

    try {
      dao.remove(path(path, false));
    } catch (final SQLException sqle) {
      throw new FileSystemException(sqle);
    }
  }

  public void move(final String from, final String to) throws FileSystemException {

    try {
      dao.move(path(from, false), path(to, true));
    } catch (final SQLException | IOException ex) {
      throw new FileSystemException(ex);
    }
  }

  public void copy(final String from, final String to) throws FileSystemException {

    try {
      dao.copy(path(from, false), path(to, true));
    } catch (final SQLException | IOException ex) {
      throw new FileSystemException(ex);
    }
  }

  private int[] path(final String s, final boolean create) throws SQLException {

    String[] names = s.split(SEPERATOR);
    int[] path = range(rootPath, 0, rootPath.length + (names.length > 0 ? names.length - 1 : 0));

    File f = dao.read(rootPath);

    for (int level = 1; level < names.length; level++) {

      File p = f;

      f = dao.child(names[level], range(path, 0, level));

      if (null != f) {

        path[level] = (f.getNv() - p.getNv()) / p.getSnv();

      } else if (null == f && create) {

        path[level] = 1;
        File lc = dao.lastChild(range(path, 0, level));
        if (null != lc) {
          path[level] = (lc.getSnv() - p.getNv()) / p.getSnv();
        }

        f = dao.write(names[level], range(path, 0, level + 1));
      }
    }

    return path;
  }
}

An example usage of the above high level API for creating directories, reading and writing files, and moving sub trees is below.

Example.java

package org.adrianwalker.continuedfractions.filesystem.example;

import java.sql.Connection;
import java.sql.DriverManager;
import org.adrianwalker.continuedfractions.filesystem.File;
import org.adrianwalker.continuedfractions.filesystem.FileSystem;
import org.adrianwalker.continuedfractions.filesystem.FilesDAO;
import static org.adrianwalker.continuedfractions.filesystem.Printer.print;

public class Example {

  private static final String DRIVER = "org.postgresql.Driver";
  private static final String URL = "jdbc:postgresql://localhost:5432/postgres";
  private static final String USERNAME = "postgres";
  private static final String PASSWORD = "postgres";

  public static void main(final String[] args) throws Exception {

    Class.forName(DRIVER);
    Connection connection = DriverManager.getConnection(URL, USERNAME, PASSWORD);
    connection.setAutoCommit(false);

    FilesDAO dao = new FilesDAO(connection);
    dao.clear();

    FileSystem fs = new FileSystem(dao, 1);

    System.out.println("Create directories\n");

    fs.create("/bin");
    fs.create("/dev");
    fs.create("/etc");
    fs.create("/home");
    fs.create("/sbin");
    fs.create("/usr");

    fs.create("/home/adrian");
    fs.create("/home/other");

    fs.create("/home/adrian/documents/text");
    fs.create("/home/adrian/documents/presentations");
    fs.create("/home/adrian/documents/spreadsheets");

    connection.commit();

    print(fs.tree("/"));

    System.out.println("\nWrite files\n");

    fs.write("/home/adrian/documents/text/test1.txt", "Hello");
    fs.write("/home/adrian/documents/text/test2.txt", "Database");
    fs.write("/home/adrian/documents/text/test3.txt", "File System");
    fs.write("/home/adrian/documents/text/test4.txt", "World!");

    connection.commit();

    print(fs.tree("/"));

    System.out.println("\nMove files\n");

    fs.move("/home/adrian/documents", "/home/other");

    connection.commit();

    print(fs.tree("/"));

    System.out.println("\nPrint files\n");

    for (File file : fs.list("/home/other/documents/text")) {
      System.out.println(fs.read("/home/other/documents/text/" + file.getName()));
    }
  }
}

The above example code produces the output:

Create directories

/
  /bin
  /dev
  /etc
  /home
    /adrian
      /documents
        /text
        /presentations
        /spreadsheets
    /other
  /sbin
  /usr

Write files

/
  /bin
  /dev
  /etc
  /home
    /adrian
      /documents
        /text
          /test1.txt
          /test2.txt
          /test3.txt
          /test4.txt
        /presentations
        /spreadsheets
    /other
  /sbin
  /usr

Move files

/
  /bin
  /dev
  /etc
  /home
    /adrian
    /other
      /documents
        /text
          /test1.txt
          /test2.txt
          /test3.txt
          /test4.txt
        /presentations
        /spreadsheets
  /sbin
  /usr

Print files

Hello
Database
File System
World!

Source Code