Distributed Object Persistence API for Cassandra

Distributed Object Persistence API for Cassandra (OPAC) is a distributed object-relational mapping persistence API for saving and retrieving Java objects and relationship graphs from Cassandra.

OPAC also enables definition-driven management of C* db objects such as schema, tables, materialized views, constraints, etc.

OPAC can retrieve objects from multiple no-sql and/or relational data-sources and construct distributed object graphs. OPAC uses object query language (oql) to perform complex queries. OPAC is specifically developed with C* in mind – for example, it only retrieves minimally required columns while performing complex queries instead of fetching all columns.

Another key feature of OPAC is configuration-driven management of composite-key-columns (de-normalized data). OPAC automatically creates and maintains composite-key-columns data eliminating any need for manual coding.

OPAC also provides copy functionality. Data can be loaded into C* table using either POJOs or RDBMS data source tables.

OPAC is built with strict adherence to loose-coupling principle. As you will see in the Pet example below, no part of your Java code extends or implements any OPAC classes or interfaces!

So how is OPAC different from any other Java persistence APIs?

Cassandra is a NoSql database and supports simple one-table queries only. To satisfy complex query requirements it is left up to Java developers to fetch data from multiple tables and compose it into resulting object(s).

This development effort to join, query and compose result let’s say costs N for two tables. Per the graph theory for n number of nodes there are n(n−1)/2 edges. So for a three table scenario, the cost would be 3 x (3-1)/2 = 3 N. For ten tables, it would be 10 x (10-1)/2 = 45 N! For hundred tables, it would be 100 x (100-1)/2= 9900/2=4950 N!

As you can see as the system grows in functionality (number of tables increase) the cost to join, query and compose result increases exponentially. OPAC provides declarative join, query and result computation and thus eliminates middle-tier development effort.

Let’s work with a sample Pet db with schema:

Using OPAC you can write a join as below:

Hobby.findByName('bark') as e1
->PetHobby.findByHobbyID(e1.hobbyID) as e2
->return Pet.findByPrimaryKey(e2.petID);

Here Pet, PetHobby and Hobby are defined as components and findByName, findByHobbyID and findByPrimaryKey are the single table queries.

For sub-queries Hobby.findByName and PetHobby.findByHobbyID, OPAC only retrieves columns necessary for joins resulting in higher performance.

Before getting further into details, a quick review of terminology: OPAC is a Java driver that can be invoked from a Java program to access managed objects (defined in schema) and perform CRUD operations. OPAC-Cli is a command-line interface provided to work with OPAC Java driver. Now let’s start with OPAC-Cli.

Using OPAC-CLI

 OPAC Command-line Utility
(opac-cli-dist.jar contains opac-cli-all.jar, opac-config.xml, opac-sample-module.xml, POJOs and initial-data loaders.)

# Extract zip:
unzip opac-cli-dist.zip

# cd
cd opac-cli-dist

# Compile included Pet and Pet data initializer beans
javac -cp .:opac-cli-all.jar *.java pojo/*.java data/*.java

# Edit ./opac-config.xml for your C* config:


# Run opac-cli
java -cp .:opac-cli-all.jar Console ./opac-config.xml

# Run the following command at the command prompt to initialize schema:
> initialize schema;

# Run the following command to initialize data in Pet, Hobby and PetHobby tables
> initialize data;
 
# Check if all is okay by running the following query at the command prompt >
> return Pet.findAll();

If Sample Pet db is up and running, it should return a result containing few records created by the data initializers!

Sample Pet db schema consists of three tables Pet, Hobby and PetHobby. Initially created pet-hobbies are as below:

Let’s run the following queries at the OPAC command prompt:

# query 1: Get all pets that are 'player' (answer: Judo)
Hobby.findByName('bark') as r1
-> PetHobby.findByHobbyID(r1.hobbyID)
-> return Pet.findByPrimaryKey(e.petID);

# query 2: Get all pets that have a hobby! (answer: Judo and Vithu)
return Pet.findAll() as r1
-> exists PetHobby.findByPetID(r1.petID);

# query 3: Get all pets who don't have a hobby! (answer: Rock)
return Pet.findAll() as r1
-> not exists PetHobby.findByPetID(r1.petID);

# query 4: Get all pets that don't have hobby #1 - 'trail-running'  (answer: Vithu and Rock)
return Pet.findAll() as r1
-> not exists PetHobby.findByPetID(r1.petID) as rp1
-> filter.keep(rp1.hobbyID = 1);

# query 5: Get all pets that have hobby #1 - 'trail-running' (answer: Judo)
return Pet.findAll() as r1
-> exists PetHobby.findByPetID(r1.petID) as rp1
-> filter.keep(rp1.hobbyID = 1);

# query 6: Get union of two result-sets (answer: Judo and Rock)
return Pet.findByName('Judo')
-> union Pet.findByName('Rock');

Now let’s try to add another row to Hobby table by typing the following insert statement at the command prompt:

insert into Hobby (hobbyID, name, description) values (4, 'fly', 'Fly high for no particular reason!');

Note: By no means opac-cli replaces cqlsh! The primary objective for writing opac-cli utility is to provide a development environment for testing and running ad-hoc opac-syntax queries.

The schema defines Hobby.hobbyID as the primary key. So if a record already exists with this hobby it will result in a unique-key violation message, else it will succeed.

As you have seen in this Pet example, no part of your Java code – PetBean, HobbyBean and PetHobbyBean – is extending or implementing any OPAC classes or interfaces! You can simply add a new component to a new module file or to the existing opac-pet-hobby-module.xml and initialize the new component by typing:

initialize schema;

Using OPAC Java Driver

Take a look at DriverUsageExample.java to understand how to save/retrieve Pet objects to/from Cassandra. Run DriverUsageExample as below:

java -cp .:opac-cli-all.jar DriverUsageExample ./opac-config.xml

OPAC Driver Unlimited Production-level Support

Get unlimited production-level support including but not limited to installation, configuration, schema definition, performance tuning, best practices recommendations, and top-priority feature implementation and bug fixes.

$29.99

Distributed Object Persistence API for Cassandra

Published by Sandeep S. Dixit

5 thoughts on “Distributed Object Persistence API for Cassandra”

Leave a comment Cancel reply

Distributed Object Persistence API for Cassandra

Share this:

Related

Published by Sandeep S. Dixit

5 thoughts on “Distributed Object Persistence API for Cassandra”

Leave a comment Cancel reply