Migrating SQL applications to Cassandra – Strategy

Today’s typical Java application consists of SQL statements defined either using Data Access Object pattern or Java Persistence Query Language (JPQL). These SQLs can be categorized into three categories:

  • Simple SQL: These are one table SQLs like select * from Employee where employeeID=1
  • Simple multi-table SQL: These are simple SQLs but involve more than one table, for example select e1.* from Employee e1 where exists (select 'true' from Course c1 where c1.taker = e1.employeeID and c1.name = 'Cassandra basics')
  • Complex SQL: These are multi-table, multi-join SQLs with complex logic of EXISTS, NOT EXISTS, etc. For example, select e1.* from Employee e1 where exists (select 'true' from Course c1 where c1.taker = e1.employeeID and c1.name = 'Cassandra basics') and NOT EXISTS (select 'true' Grade g1 where g1.courseID = c1.courseID and g1.grade = 'Failed'))

Strategy for migrating simple SQL is straight-forward – use Cassandra Java driver to create/retrieve data. This has minimal impact on the application.

Simple multi-table SQL and complex SQL can be handled in two ways – option 1) middle-tier graph construction using Object Persistence API for C* or option 2) use of Cassandra jdbc driver such as Apache Presto or Apache Drill. Although these two may seem different, architecturally they are the same – Apache Presto or Apache Drill would essentially be doing the same thing – building a graph of objects on cluster nodes and converting it to SQL compliant result set. Using JDBC has minimal impact on the application.

There are two reasons why we may want to use the first option of executing multiple queries and composing your own object graph : 1) performance: with your own graph construction you can achieve significantly higher performance which may be the pressing need and 2) avoid duplicate retrievals: you may need the data, immediately or in sub-sequent calls, that is actually pulled by the jdbc driver internally (to prepare the resultset) but is not returned (for architectural reasons) as part of the resultset.

One thought on “Migrating SQL applications to Cassandra – Strategy

Leave a comment