Archive

Archive for mei, 2010

What i have learned about Cassandra NoSQL DB

mei 12th, 2010 2 comments

NOTE: THIS IS FOR CASSANDRA 0.0.5 (OLD)

Normally I write in dutch, but because the rest of the world doesn’t understand dutch (you should learn our bastard language!), i’ll have to evolve to english (and more international) blogposting. (So don’t shoot me if i make major grammatical mistakes and don’t be a bitch about it!)

I’ll try to keep this post as up-to-date posible

So Cassandra is an NoSQL Database, that means, no SQL language.

I even dare to call it an improved memcached.

First of all i’ll post a couple of links where i learned a lot from

Documentation:

Examples:

Libraries:

There is a List of libraries on http://wiki.apache.org/cassandra/ClientOptions

I’m using the phpcassa lib. (yeah i use php, deal with it)

Since there is a lack of advanced beginner documentation, i have made an little example based on the “wtf is a supercolumn datamodel

$GLOBALS['THRIFT_ROOT'] = 'lib/classes/api/phpcassa/thrift/';
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/Cassandra.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';

include_once('lib/classes/api/phpcassa/phpcassa.php');
include_once('lib/classes/api/phpcassa/uuid.php');

CassandraConn::add_node('127.0.0.1', 9160);

/**
 * Example based on the blog example of http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
 */
/**
 * what you see next, you need to add to your storage-conf.xml
**/
/**
<Keyspace Name="BloggyAppy">

	<!-- CF definitions -->
	<ColumnFamily CompareWith="BytesType" Name="Authors"/>
	<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>
	<ColumnFamily CompareWith="TimeUUIDType" Name="TaggedPosts"/>
	<ColumnFamily CompareWith="TimeUUIDType" Name="Comments"
		CompareSubcolumnsWith="BytesType" ColumnType="Super"/>
	<!-- other keyspace config stuff like replication values -->

</Keyspace>
**/

/**
 * Load the ColumnFamilies
 * Normally there you should define the consistancy levels, but that depends on the number of nodes/replication factor or what kind of data it is.
 */
echo "Loading ColumnFamilies..";
$authors = new CassandraCF('BloggyAppy', 'Authors', false, 'BytesType');
$blogentries = new CassandraCF('BloggyAppy', 'BlogEntries', false, 'BytesType');
$tagged = new CassandraCF('BloggyAppy', 'TaggedPosts', false, 'TimeUUIDType');
$comments = new CassandraCF('BloggyAppy', 'Comments', true, 'TimeUUIDType', 'BytesType');
echo "Done.\n";

/**
 * Insert example data
 */

echo "Inserting data..";
$authors->insert("Arin Sarkissian", array("numPosts" => 11, "twitter" => "phatduckk", "email" => "[email protected]", "bio" => "bla bla bla"));
///
$blogentries->insert("i-got-a-new-guitar", array("title" => "This is a blog entry about my new, awesome guitar", "body" => "this is a cool entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "life,guitar,music", "pubDate" => time(), "slug" => "i-got-a-new-guitar"));
$timeuuid_1 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "i-got-a-new-guitar");
$tagged->insert("guitar", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("life", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("music", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("__notag__", array($timeuuid_1 => "i-got-a-new-guitar"));
//
$blogentries->insert("another-cool-guitar", array("title" => "This is a blog entry about my other guitar", "body" => "this is a cool entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "guitar", "pubDate" => time(), "slug" => "another-cool-guitar"));
$timeuuid_2 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "another-cool-guitar");
$tagged->insert("guitar", array($timeuuid_2 => "another-cool-guitar"));
$tagged->insert("__notag__", array($timeuuid_2 => "another-cool-guitar"));
//
$blogentries->insert("scream-is-the-best-movie-ever", array("title" => "This is a blog entry about my favorite movie Scream!", "body" => "this is a cool movie entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "movie,horror,", "pubDate" => time(), "slug" => "scream-is-the-best-movie-ever"));
$timeuuid_3 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "scream-is-the-best-movie-ever");
$tagged->insert("movie", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
$tagged->insert("horror", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
$tagged->insert("__notag__", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
///
$timeuuid_1a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Joe Blow");
$comments->insert("scream-is-the-best-movie-ever", array($timeuuid_1a => array("commenter" => "Joe Blow", "email" => "[email protected]", "comment" => "you're a dumb douche, the godfather is the best movie ever", "commentTime" => time())));
//
$timeuuid_2a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Some Dude");
$comments->insert("scream-is-the-best-movie-ever", array($timeuuid_2a => array("commenter" => "Some Dude", "email" => "[email protected]", "comment" => "be nice Joe Blow this isnt youtube", "commentTime" => time())));
//
$timeuuid_1b = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Johnny Guitar");
$comments->insert("i-got-a-new-guitar", array($timeuuid_1b => array("commenter" => "Johnny Guitar", "email" => "[email protected]", "comment" => "nice axe dawg...", "commentTime" => time())));
///
echo "Done.\n";

/**
 * Fetch data
 */

echo "\n##FETCHING DATA##\n";
$taggedposts = $tagged->get("__notag__"); // You can change this value
foreach ($taggedposts as $tpost){
	echo "Fetching post: ".$tpost."\n";
	$blogentry = $blogentries->get($tpost);
	print_r($blogentry);
	if (!empty($blogentry)){
		echo "Fetching author info: \n";
		$author = $authors->get($blogentry["author"]);
		print_r($author);
		$numcomments = $comments->get_count($tpost);
		echo "Fetching ".$numcomments." comments for ".$tpost.": \n";
		$commentposts = $comments->get($tpost);
		print_r($commentposts);
	}
	echo "\n##NEXT##\n";
}
echo "\n##END FETCHING DATA##\n";

/**
 * Get a range of posts
 */
echo "\n##GET ALL BLOGENTRIES, YOU CAN ALSO LIMIT##\n";
$blogs = $blogentries->get_range();
print_r($blogs);

///
echo "\nScript Done.\n";

This is a basic working example for the use of the cassandra Basics.
Later i will post some example’s of what i have done in my own production code, like using timestamps as indexes to get some data.
I’ll also post some stuff where i was stuck etc, it should help other people, so look for my future cassandra posts ^^