Archive

Posts Tagged ‘nosql’

What i have learned about Cassandra NoSQL DB

mei 12th, 2010 2 comments

NOTE: THIS IS FOR CASSANDRA 0.0.5 (OLD)

Normally I write in dutch, but because the rest of the world doesn’t understand dutch (you should learn our bastard language!), i’ll have to evolve to english (and more international) blogposting. (So don’t shoot me if i make major grammatical mistakes and don’t be a bitch about it!)

I’ll try to keep this post as up-to-date posible

So Cassandra is an NoSQL Database, that means, no SQL language.

I even dare to call it an improved memcached.

First of all i’ll post a couple of links where i learned a lot from

Documentation:

Examples:

Libraries:

There is a List of libraries on http://wiki.apache.org/cassandra/ClientOptions

I’m using the phpcassa lib. (yeah i use php, deal with it)

Since there is a lack of advanced beginner documentation, i have made an little example based on the “wtf is a supercolumn datamodel

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
$GLOBALS['THRIFT_ROOT'] = 'lib/classes/api/phpcassa/thrift/';
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/Cassandra.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';
 
include_once('lib/classes/api/phpcassa/phpcassa.php');
include_once('lib/classes/api/phpcassa/uuid.php');
 
CassandraConn::add_node('127.0.0.1', 9160);
 
/**
 * Example based on the blog example of http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
 */
/**
 * what you see next, you need to add to your storage-conf.xml
**/
/**
<Keyspace Name="BloggyAppy">
 
    <!-- CF definitions -->
    <ColumnFamily CompareWith="BytesType" Name="Authors"/>
    <ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>
    <ColumnFamily CompareWith="TimeUUIDType" Name="TaggedPosts"/>
    <ColumnFamily CompareWith="TimeUUIDType" Name="Comments"
        CompareSubcolumnsWith="BytesType" ColumnType="Super"/>
    <!-- other keyspace config stuff like replication values -->
 
</Keyspace>
**/
 
/**
 * Load the ColumnFamilies
 * Normally there you should define the consistancy levels, but that depends on the number of nodes/replication factor or what kind of data it is.
 */
echo "Loading ColumnFamilies..";
$authors = new CassandraCF('BloggyAppy', 'Authors', false, 'BytesType');
$blogentries = new CassandraCF('BloggyAppy', 'BlogEntries', false, 'BytesType');
$tagged = new CassandraCF('BloggyAppy', 'TaggedPosts', false, 'TimeUUIDType');
$comments = new CassandraCF('BloggyAppy', 'Comments', true, 'TimeUUIDType', 'BytesType');
echo "Done.\n";
 
/**
 * Insert example data
 */
 
echo "Inserting data..";
$authors->insert("Arin Sarkissian", array("numPosts" => 11, "twitter" => "phatduckk", "email" => "arin@example.com", "bio" => "bla bla bla"));
///
$blogentries->insert("i-got-a-new-guitar", array("title" => "This is a blog entry about my new, awesome guitar", "body" => "this is a cool entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "life,guitar,music", "pubDate" => time(), "slug" => "i-got-a-new-guitar"));
$timeuuid_1 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "i-got-a-new-guitar");
$tagged->insert("guitar", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("life", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("music", array($timeuuid_1 => "i-got-a-new-guitar"));
$tagged->insert("__notag__", array($timeuuid_1 => "i-got-a-new-guitar"));
//
$blogentries->insert("another-cool-guitar", array("title" => "This is a blog entry about my other guitar", "body" => "this is a cool entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "guitar", "pubDate" => time(), "slug" => "another-cool-guitar"));
$timeuuid_2 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "another-cool-guitar");
$tagged->insert("guitar", array($timeuuid_2 => "another-cool-guitar"));
$tagged->insert("__notag__", array($timeuuid_2 => "another-cool-guitar"));
//
$blogentries->insert("scream-is-the-best-movie-ever", array("title" => "This is a blog entry about my favorite movie Scream!", "body" => "this is a cool movie entry. etc etc yada yada", "author" => "Arin Sarkissian", "tags" => "movie,horror,", "pubDate" => time(), "slug" => "scream-is-the-best-movie-ever"));
$timeuuid_3 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "scream-is-the-best-movie-ever");
$tagged->insert("movie", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
$tagged->insert("horror", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
$tagged->insert("__notag__", array($timeuuid_3 => "scream-is-the-best-movie-ever"));
///
$timeuuid_1a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Joe Blow");
$comments->insert("scream-is-the-best-movie-ever", array($timeuuid_1a => array("commenter" => "Joe Blow", "email" => "joeb@example.com", "comment" => "you're a dumb douche, the godfather is the best movie ever", "commentTime" => time())));
//
$timeuuid_2a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Some Dude");
$comments->insert("scream-is-the-best-movie-ever", array($timeuuid_2a => array("commenter" => "Some Dude", "email" => "sd@example.com", "comment" => "be nice Joe Blow this isnt youtube", "commentTime" => time())));
//
$timeuuid_1b = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Johnny Guitar");
$comments->insert("i-got-a-new-guitar", array($timeuuid_1b => array("commenter" => "Johnny Guitar", "email" => "guitardude@example.com", "comment" => "nice axe dawg...", "commentTime" => time())));
///
echo "Done.\n";
 
/**
 * Fetch data
 */
 
echo "\n##FETCHING DATA##\n";
$taggedposts = $tagged->get("__notag__"); // You can change this value
foreach ($taggedposts as $tpost){
    echo "Fetching post: ".$tpost."\n";
    $blogentry = $blogentries->get($tpost);
    print_r($blogentry);
    if (!empty($blogentry)){
        echo "Fetching author info: \n";
        $author = $authors->get($blogentry["author"]);
        print_r($author);
        $numcomments = $comments->get_count($tpost);
        echo "Fetching ".$numcomments." comments for ".$tpost.": \n";
        $commentposts = $comments->get($tpost);
        print_r($commentposts);
    }
    echo "\n##NEXT##\n";
}
echo "\n##END FETCHING DATA##\n";
 
/**
 * Get a range of posts
 */
echo "\n##GET ALL BLOGENTRIES, YOU CAN ALSO LIMIT##\n";
$blogs = $blogentries->get_range();
print_r($blogs);
 
///
echo "\nScript Done.\n";

This is a basic working example for the use of the cassandra Basics.
Later i will post some example’s of what i have done in my own production code, like using timestamps as indexes to get some data.
I’ll also post some stuff where i was stuck etc, it should help other people, so look for my future cassandra posts ^^