What i have learned about Cassandra NoSQL DB
NOTE: THIS IS FOR CASSANDRA 0.0.5 (OLD)
Normally I write in dutch, but because the rest of the world doesn’t understand dutch (you should learn our bastard language!), i’ll have to evolve to english (and more international) blogposting. (So don’t shoot me if i make major grammatical mistakes and don’t be a bitch about it!)
I’ll try to keep this post as up-to-date posible
So Cassandra is an NoSQL Database, that means, no SQL language.
I even dare to call it an improved memcached.
First of all i’ll post a couple of links where i learned a lot from
Documentation:
- http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model – Is the most detailed page there is about the data model
- http://wiki.apache.org/cassandra/API – Is an overview of the API calls posible
- http://www.slideshare.net/jericevans/the-cassandra-distributed-database – The slides of cassandra presentation on fosdem, you need to read this!
- http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency – Slides about how replication works in cassandra
Examples:
- http://github.com/ericflo/twissandra/ – Is a great example of how Cassandra’s dataset works, this is a twitterclone written in python
Libraries:
There is a List of libraries on http://wiki.apache.org/cassandra/ClientOptions
I’m using the phpcassa lib. (yeah i use php, deal with it)
Since there is a lack of advanced beginner documentation, i have made an little example based on the “wtf is a supercolumn datamodel”
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 | $GLOBALS [ 'THRIFT_ROOT' ] = 'lib/classes/api/phpcassa/thrift/' ; require_once $GLOBALS [ 'THRIFT_ROOT' ]. '/packages/cassandra/Cassandra.php' ; require_once $GLOBALS [ 'THRIFT_ROOT' ]. '/transport/TSocket.php' ; require_once $GLOBALS [ 'THRIFT_ROOT' ]. '/protocol/TBinaryProtocol.php' ; require_once $GLOBALS [ 'THRIFT_ROOT' ]. '/transport/TFramedTransport.php' ; require_once $GLOBALS [ 'THRIFT_ROOT' ]. '/transport/TBufferedTransport.php' ; include_once ( 'lib/classes/api/phpcassa/phpcassa.php' ); include_once ( 'lib/classes/api/phpcassa/uuid.php' ); CassandraConn::add_node( '127.0.0.1' , 9160); /** * Example based on the blog example of http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model */ /** * what you see next, you need to add to your storage-conf.xml **/ /** <Keyspace Name="BloggyAppy"> <!-- CF definitions --> <ColumnFamily CompareWith="BytesType" Name="Authors"/> <ColumnFamily CompareWith="BytesType" Name="BlogEntries"/> <ColumnFamily CompareWith="TimeUUIDType" Name="TaggedPosts"/> <ColumnFamily CompareWith="TimeUUIDType" Name="Comments" CompareSubcolumnsWith="BytesType" ColumnType="Super"/> <!-- other keyspace config stuff like replication values --> </Keyspace> **/ /** * Load the ColumnFamilies * Normally there you should define the consistancy levels, but that depends on the number of nodes/replication factor or what kind of data it is. */ echo "Loading ColumnFamilies.." ; $authors = new CassandraCF( 'BloggyAppy' , 'Authors' , false, 'BytesType' ); $blogentries = new CassandraCF( 'BloggyAppy' , 'BlogEntries' , false, 'BytesType' ); $tagged = new CassandraCF( 'BloggyAppy' , 'TaggedPosts' , false, 'TimeUUIDType' ); $comments = new CassandraCF( 'BloggyAppy' , 'Comments' , true, 'TimeUUIDType' , 'BytesType' ); echo "Done.\n" ; /** * Insert example data */ echo "Inserting data.." ; $authors ->insert( "Arin Sarkissian" , array ( "numPosts" => 11, "twitter" => "phatduckk" , "email" => "arin@example.com" , "bio" => "bla bla bla" )); /// $blogentries ->insert( "i-got-a-new-guitar" , array ( "title" => "This is a blog entry about my new, awesome guitar" , "body" => "this is a cool entry. etc etc yada yada" , "author" => "Arin Sarkissian" , "tags" => "life,guitar,music" , "pubDate" => time(), "slug" => "i-got-a-new-guitar" )); $timeuuid_1 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "i-got-a-new-guitar" ); $tagged ->insert( "guitar" , array ( $timeuuid_1 => "i-got-a-new-guitar" )); $tagged ->insert( "life" , array ( $timeuuid_1 => "i-got-a-new-guitar" )); $tagged ->insert( "music" , array ( $timeuuid_1 => "i-got-a-new-guitar" )); $tagged ->insert( "__notag__" , array ( $timeuuid_1 => "i-got-a-new-guitar" )); // $blogentries ->insert( "another-cool-guitar" , array ( "title" => "This is a blog entry about my other guitar" , "body" => "this is a cool entry. etc etc yada yada" , "author" => "Arin Sarkissian" , "tags" => "guitar" , "pubDate" => time(), "slug" => "another-cool-guitar" )); $timeuuid_2 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "another-cool-guitar" ); $tagged ->insert( "guitar" , array ( $timeuuid_2 => "another-cool-guitar" )); $tagged ->insert( "__notag__" , array ( $timeuuid_2 => "another-cool-guitar" )); // $blogentries ->insert( "scream-is-the-best-movie-ever" , array ( "title" => "This is a blog entry about my favorite movie Scream!" , "body" => "this is a cool movie entry. etc etc yada yada" , "author" => "Arin Sarkissian" , "tags" => "movie,horror," , "pubDate" => time(), "slug" => "scream-is-the-best-movie-ever" )); $timeuuid_3 = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "scream-is-the-best-movie-ever" ); $tagged ->insert( "movie" , array ( $timeuuid_3 => "scream-is-the-best-movie-ever" )); $tagged ->insert( "horror" , array ( $timeuuid_3 => "scream-is-the-best-movie-ever" )); $tagged ->insert( "__notag__" , array ( $timeuuid_3 => "scream-is-the-best-movie-ever" )); /// $timeuuid_1a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Joe Blow" ); $comments ->insert( "scream-is-the-best-movie-ever" , array ( $timeuuid_1a => array ( "commenter" => "Joe Blow" , "email" => "joeb@example.com" , "comment" => "you're a dumb douche, the godfather is the best movie ever" , "commentTime" => time()))); // $timeuuid_2a = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Some Dude" ); $comments ->insert( "scream-is-the-best-movie-ever" , array ( $timeuuid_2a => array ( "commenter" => "Some Dude" , "email" => "sd@example.com" , "comment" => "be nice Joe Blow this isnt youtube" , "commentTime" => time()))); // $timeuuid_1b = UUID::generate(UUID::UUID_TIME,UUID::FMT_STRING, "Johnny Guitar" ); $comments ->insert( "i-got-a-new-guitar" , array ( $timeuuid_1b => array ( "commenter" => "Johnny Guitar" , "email" => "guitardude@example.com" , "comment" => "nice axe dawg..." , "commentTime" => time()))); /// echo "Done.\n" ; /** * Fetch data */ echo "\n##FETCHING DATA##\n" ; $taggedposts = $tagged ->get( "__notag__" ); // You can change this value foreach ( $taggedposts as $tpost ){ echo "Fetching post: " . $tpost . "\n" ; $blogentry = $blogentries ->get( $tpost ); print_r( $blogentry ); if (! empty ( $blogentry )){ echo "Fetching author info: \n" ; $author = $authors ->get( $blogentry [ "author" ]); print_r( $author ); $numcomments = $comments ->get_count( $tpost ); echo "Fetching " . $numcomments . " comments for " . $tpost . ": \n" ; $commentposts = $comments ->get( $tpost ); print_r( $commentposts ); } echo "\n##NEXT##\n" ; } echo "\n##END FETCHING DATA##\n" ; /** * Get a range of posts */ echo "\n##GET ALL BLOGENTRIES, YOU CAN ALSO LIMIT##\n" ; $blogs = $blogentries ->get_range(); print_r( $blogs ); /// echo "\nScript Done.\n" ; |
This is a basic working example for the use of the cassandra Basics.
Later i will post some example’s of what i have done in my own production code, like using timestamps as indexes to get some data.
I’ll also post some stuff where i was stuck etc, it should help other people, so look for my future cassandra posts ^^
Very good written Article, i ready very often here …
so regards for yoo @admin and all the best.
thanks !! very helpful post!