Actually Using the Database
Writing a Haskell API Server, Part 1
Chris Wilson‘s and my mission for Bendyworks was ostensibly simple: build a secure database-backed API server in Haskell. We didn’t expect to be treading new-ish ground. When we began architecting, we could’ve started from numerous burgeoning and maturing Haskell web frameworks such as Snap or Yesod, but these would’ve been prefabricated overkill for our purposes. Instead, we decided to tread away and see where we arrived.
Free from the shackles of any particular web framework, we could leverage each framework’s best architecture decisions (such as MVC or REST) while eschewing those we found overly-constraining. We started with a single notion: model the domain logic at the type level. This meant our domain should be immediately perceptible from our database constraints and Haskell type signatures alone. What surprised us is how much Haskell we wouldn’t end up writing.
Before we get into that, though, let’s look at how some well-known server frameworks such as Rails handle their database layer.
ActiveRecord and the Dumb Data Store
Rails uses ActiveRecord, a database-agnostic, query-generating object-relational mapper. The tragedy of database-agnostic ORMs is twofold: they can only expose the lowest common denominator of functionality of the databases they query, and their generated SQL can be difficult to reason about.
In ActiveRecord, this lowest common denominator manifests as treating the database as a dumb data store. This means almost any data operation more complex than a read or write is lifted to the application layer— a seeminly clear violation of the separation of concerns. This also forces the framework or application authors to rewrite logic supplied by the database. We’ll see shortly what this means in practice.
We decided instead to leverage our preferred database, PostgreSQL, as heavily as possible, keeping our database logic centralized and our application thin.
Writing SQL in SQL
As stated earlier, ORMs provide a poor substitute for SQL. If the ORM, like ActiveRecord, provides its own interpretation between the application language and SQL, then the SQL becomes difficult to reason about. If the ORM instead provides a 1-to-1 mapping between the native language and SQL, it merely serves as an obfuscation layer. For this reason, we decided that all SQL would be written in plain SQL.
The strengths of this decision are quickly realized. Let’s look at a sample of our application’s database schema:
CREATE TABLE contacts ( id INTEGER PRIMARY KEY, name VARCHAR NOT NULL, email CITEXT NOT NULL, address VARCHAR, url VARCHAR, business_id INTEGER REFERENCES businesses, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL ); CREATE UNIQUE INDEX email_on_contacts ON contacts (email);
At first glance, this doesn’t differ much from how a Rails or Django-style schema might appear. We’ve copied the Rails decision to mark each API resource with a unique
id primary key and
We’ve also relegated some of what would be ActiveRecord validations to PostgreSQL.
NOT NULL constraints,
CITEXT type extension, and
business_id can only reference an existing record on the
businesses table. Without ever touching application logic, we’ve created strong guards against data inconsistency.
The Power of SQL
SQL— and PostgreSQL in specific— can do so much more, though. For instance, we know that every API resource table will have
id and timestamp columns, and that their behaviors should be constrained:
created_atshould be set when a record is created
- …and idempotent for that record forever after
updated_atshould be set each time a record is updated
- none of these fields should be alterable by the API server
These certainly seem like our database’s responsibility, and indeed PostgreSQL can handle these constraints with panache. We start by creating stored procedures at the top of our schema file:
CREATE FUNCTION on_record_insert() RETURNS trigger AS $$ DECLARE id_sequence VARCHAR; BEGIN SELECT TG_ARGV INTO id_sequence; NEW.id := nextval(id_sequence); NEW.created_at := now(); NEW.updated_at := now(); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE FUNCTION on_record_update() RETURNS trigger AS $$ BEGIN NEW.id := OLD.id; NEW.created_at := OLD.created_at; NEW.updated_at := now(); RETURN NEW; END; $$ LANGUAGE plpgsql;
Notice that our
on_record_insert() procedure takes the name of a sequence so that each table can maintain a separate sequence for its
Next, we attach these procedures to their respective tables (after the table is created) with triggers:
CREATE SEQUENCE contact_ids START 1; CREATE TRIGGER contacts_insert BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE on_record_insert('contact_ids'); CREATE TRIGGER contacts_update BEFORE UPDATE ON contacts FOR EACH ROW EXECUTE PROCEDURE on_record_update();
Applying these contraints and procedures to our database API resources ensures our data consistency at the database level— thanks to the strength of PostgreSQL, no application or concurrency errors can break our consistency.
The use of stored procedures is a hotly contested topic within web application development circles. We like to believe this demonstrates the benefit of a surgical use of stored procedures for generalized data constraints across multiple tables.
Having a tightly-constrained schema alone would make this post worthwhile, but the fine folks at PostgreSQL have provided much more.
For instance, we know that our architecture requires token-based authentication. In Rails, this might be handled by dropping an overly behavior-constraining engine such as Devise into the project. In Haskell, we might want to import the Bcrypt module and write newtypes and functions to convert or compare hashed tokens. But looking up the correct user based on a verified token feels like a data problem— and indeed, PostgresSQL can handle this problem with ease.
First, we’ll need to enable the built-in pgcrypto and uuid-ossp extensions— at the top of our schema— which will do all the heavy lifting for us.
CREATE EXTENSION pgcrypto;
pgcrypto provides two functions,
crypt(), that entirely handle creating and comparing hashed token values for us.
uuid_generate_v4(), which will generate UUIDs. With these extensions enabled, creating or looking-up a user from Haskell with a secure token is as concise as executing the following statements (seen here in hasql quasiquoted form)…
findUserByLogin :: Login -> Statement a findUserByLogin login = [q| SELECT id, contact_id FROM users WHERE id = ? AND crypt(?, token_hash) = token_hash |] (userId login) (tokenHash login) createUserWithToken :: Connection -> IO (ConvertResult [Login]) createUserWithToken = [q| WITH token_table AS (SELECT uuid_generate_v4() :: VARCHAR AS token) INSERT INTO users (token_hash) SELECT crypt(token, gen_salt('bf', 8)) FROM token_table RETURNING id, (SELECT token FROM token_table) |]
token_hash are columns on the
users table for the user’s ID and hashed token values, respectively.
As an aside, did you notice the
RETURNING clauses in the
RETURNING— a PostgreSQL extension— allows us to specify exactly what data we’d like back from any SQL query, allowing us to both create a user and return their login credentials in a single statement. In Rails, this would’ve required two distinct queries.
The use of
WITH is more subtle. We wrap the generation of the user token in a
WITH clause so that we can refer to it later in the SQL query. We do this because we want to return the unhashed token to the user for storing, though we’ll only store the hashed version.
Enabling PostgreSQL to handle authenticaton for us provides another benefit beyond code simplicity. Because
crypt() stores the hashing algorithm used as part of the hash result, all of our tokens are migration-safe! Should the application author wish to change hashing algorithms, they have only to alter the
createUser query and all future tokens will be stored using the new algorithm— all tokens are compared by the algorithm with which they were originally hashed.
A likely critique of the above statements is that they do too much outside of Haskell’s type-safety. Consider, though, that this query is already used to access the database outside of Haskell’s safety. Regardless of the query, we’ll have to test this point of IO to ensure we obtain the expected results. Having the query do more work for us does not result in a greater lack of safety- only more concise code.
The Experiment Continues
This is only a small slice of what relying on the full potential of a smart database such as PostgreSQL can provide. As we continue towards the 1.0 release of our API server, we’ll continue to see what can and should be handled by the database— and what can be provided by Haskell.
In following blog posts, we’ll talk about how to model our data in Haskell to pick up where PostgreSQL drops off and structuring our controller and routes to ensure controller action security at the type level.
In the meantime, we’d like to thank Bendyworks for providing the time required for us to make inroads into this relatively new approach to API server architecture. This type of downtime is required for developers and consultancies alike to grow in their expertise while maintaining a healthy work/life balance.