Storing hierarchical data in a database Part 2a: Modified preorder tree traversal

Second part in this series of discussions/tutorials on storing hierarchical data in a database. You can read the first on here. The original article I read that got me started on these ideas can be found here (you’ll notice the tree diagrams are similar). Most credit should be given to Gijs Van Tulder; I simply modified some of his code and added to his explanation where appropriate.

This time we’re going to take a look at the basics of the modified preordered tree traversal algorithm.

Consider the tree used in part 1, which uses the adjacency list model:

[table: animal_tree, adjacency list model]
| id  | parent_id   | name
|-----+-------------+-------------
| 1   |  0          | Animals
| 2   |  1          | Fish
| 3   |  1          | Mammals
| 4   |  2          | Marine
| 5   |  2          | Fresh Water
| 6   |  4          | Halibut
| 7   |  5          | Rainbow Trout
| 8   |  3          | Dog
| 9   |  3          | Cat
|-----+-------------+-------------

What the modified preordered method does is store the relationships between neighboring nodes in order to recreate the tree structure. We do this by storing “lft” and “rght” (“left” and “right” are reserved SQL keywords) values at each node; as you descend a tree, the left-hand values increase, and likewise, as you ascend the right-hand values increase. It’s hard to visualize until you see it in action:

tree_diag_image-lr_arrow.gif

Pay special attention to the sequence in which these nodes are ordered, as shown by the arrows.

Here’s the presorted table for the above tree:

[table: animal_tree, presorted model]
| lft | rght        | name
|-----+-------------+-------------
| 1   |  18         | Animals
| 2   |  11         | Fish
| 12  |  17         | Mammals
| 3   |  6          | Marine
| 7   |  10         | Fresh Water
| 4   |  5          | Halibut
| 8   |  9          | Rainbow Trout
| 13  |  14         | Dog
| 15  |  16         | Cat
|-----+-------------+-------------

…and one of the first things you probably want to do with your new presorted data? Create a tree. First things first, though. The code behind it is more difficult to understand than the adjacency model but the overhead is greatly reduced because you only need two queries for to retrieve tree information from the database (adjacency uses a new query every time you recursively enter the retrieve function).

In its simplest form the queries are:
(retrieves the left and right values for the starting node)
SELECT lft,rght FROM animal_tree WHERE name="$name";
SELECT * FROM animal_tree WHERE lft BETWEEN $row['lft'] AND $row['right'] ORDER BY lft ASC;
…and that’s it.

Take a look at the code (modified from the example in the Sitepoint article):

//I use my own database classes, but it should be easy enough to follow what's happening.
function get_tree($name="Animal",&$DB) {

$result=$DB->query("SELECT lft,rght FROM animal_tree WHERE name="$name"");
$row=$DB->fetchRow($result);
$result2=$DB->query("SELECT * FROM animal_tree WHERE lft BETWEEN {$row['lft']} AND {$row['rght']} ORDER BY lft ASC");

$right = array();

$row=$DB->fetchAssoc($result2);

$ii=0; //start counter at 0
foreach($row as $row) {
if(count($right)>0) {

/* $right[max#] is the last array element, which holds the
* rght value from the previous row, which is a parent in
* many cases, but if not a parent, it keeps popping-off
* the last element of the $right array - going back up
* the tree until $row[right] (current node) finds its
* parent (the parents rght value is greater than the
* child's rght)
*/
while($right[count($right)-1]$value)
{
print(str_repeat(' ',$value['indent'])."({$value['lft']})".$value['node']."({$value['right']})n");
}
}

…and in-case you want the path to a particular node:
SELECT * FROM animal_tree WHERE lft$rght ORDER BY lft ASC

..to count the number of descendants: $d=(rght-lft-1)/2

Thanks, again to Gijs for the excellent tutorial. Next time we will explore some ways to modify and manipulate the tree.

Security: Don’t blindly trust $_SERVER variables

Article one: Chris Shiflett: SERVER_NAME Versus HTTP_HOST
Article two: Sean Coates: XSS Woes

I thought both were quite insightful, though Sean’s stands-out more because PHP_SELF is used so often (be sure to check-out his phpinfo(); example at the end of the post). One method, as he explained, is to put the $_SERVER[‘PHP_SELF’] variable into htmlentities() so at least you aren’t potentially echoing dangerous output.

Heed Chris’s advice: treat $_SERVER variable just as you would $_GET and $_POST – all have the potential to be tainted.

Digital Fish Library in the news!

The DFL is in the news again!
http://www.digitalfishlibrary.org/the_news/

Included in the most recent updates: News segment from NBC 7/39 in San Diego, and the press release at the SIO website.

Also, link to our little spot in Nature (one of the most prestigious scientific journals in the world). Nature 440, 396-397 (23 March 2006)

~Cameron

The state of PHP Frameworks

Below is a comment I posted on a friend’s dev blog. Looks like he’s turning RoR because RoR just seems pretty sweet to him (originally a Perl programmer, so he welcomes the improvements Ruby’s brought to the field). Just a bit of background: Doug is my right-hand man on Flood’s new website. He’s pretty new to PHP, but has done a really good job picking it up. I should also mention he’s a FANTASTIC DB guy. If you want to query it, he can probably figure it out.

———————

PHP has several pretty good frameworks, but that shouldn’t be a problem for people in the PHP community. We’re used to having options, and lots of them. After listening to a few talks about the new Zend Framework, I think it’s going to be a pretty big deal fairly soon. What’s most exciting, for me at least, is the fact that the community isn’t being snobbish about the ways things should be done – they aren’t sticking to any one game plan (like copying one language or another). Rather, they’re pulling the best features from many languages and combining them into the framework. So RoR may be great, but it also has shortcomings to it, which the ZF community is making sure they don’t repeat.

RoR has some great things about it, or so I hear. I don’t doubt it one bit, but there is soon to be stiff competition from the PHP end. I also like to console my self, knowing that the RoR camp tends to have a few very vocal cheerleaders, so maybe it’s not as big as it may seem. But for now, Ruby just happens to have the “best” (most developed?) framework on the web. I say good for Ruby. In the end what matters most is that FOSS becomes better by building on the strengths of other projects.

–end post–

Most popular posts

So far it looks like my NVDIA Fedora Core 4 video driver install post is by far the most popular. Before we know it, I’ll probably end up doing a Fedora Core 5 NVIDIA video driver install tutorial (though I can’t imagine it would be much different).

Coming in a close second is the rsync backup tutorial from not so long ago. It has actually been working really well so far. The scripts run faithfully at their predestined times and about 2.5 hours later I have a full backup.

A distant third is the Zend Framework review I wrote. Honestly, there probably aren’t a whole lot of people interested in the topic, and I could have done a better job. There just seemed to be some problems installing it that went unresolved (and I didn’t quite have the time to figure them all out before my mini review). I have some new fodder for discussion, though, now that my good friend Doug is now developing a site in RoR for work.

Please comment on the content of this blog. I’d really like to begin posting more about things people are interested in. Certainly I’ll continue to post on all sorts of things I find myself doing, but it also wouldn’t hurt to expand on some more than others. One thing to come will be deploying a site on a 50-node rackmount server farm for the DFL. We’re not using 50 nodes because we think we’ll have enough web traffic to utilize that much horsepower. Rather, it’s for web-based scientific computing – all controlled through the PHP-based site.

So… let me know what you want to see!

Storing hierarchical data in a database: Part 1

(Click to see part 2a and 2b)
In this first of two (or three) posts, I’m going to present a fairly conventional way of representing hierarchical data in a database. (The original article I read that got me interested on posting these ideas can be found here, hence the similarity in the tree structure).

My DB of choice is MySQL, so you may have to modify your queries slightly for other DB engines.

Before we even begin, we should consider why we’d even want to do something like that – what can you even use it for? Animals. It’s something we can all relate to. If, for some reason, you needed to catalog the relationships of different animals to one another, you’d have to come-up with some sort of tree diagram to make sense of it all. Take a look at this one, for example:

What else can you say about it? Pretty simple. I particularly like fish, so I expanded a little bit there (later we’ll see how we can expand our tree to include different breeds of dogs and cats). Also notice that you can really create many different types of trees. One tree that I’m working on, for example, will demonstrate the taxonomic relationships between different species of fish for the DFL. Often times the employee/supervisor scheme is used for demonstration purposes.

The first method of storing our data is by simply indicating to which parent does each item belongs. Take a look at this table:

[table: animal_tree]
| id  | parent_id   | name        
|-------------------------------   
| 1   |  0          | Animals         
| 2   |  1          | Fish           
| 3   |  1          | Mammals   
| 4   |  2          | Marine       
| 5   |  2          | Fresh Water
| 6   |  4          | Halibut       
| 7   |  5          | Rainbow Trout
| 8   |  3          | Dog          
| 9   |  3          | Cat           
|---------------------------------

*Note: I chose to have parent=0 to mean that’s the top-level of our tree (there is no row id 0).

That wasn’t too painless, now was it? The parent_id column basically points to the node (location in the tree) to which that particular item belongs. For example, cat (id 9) has a parent_id 3, which is the id of “Mammal.” Using this information we can reconstruct the tree above (it probably won’t look quite like that, but the structure of the data is what we’re concerned with).

There’s essentially only one basic query we need to achieve our goals:
$query="SELECT * FROM animal_tree where parent_id=$id";
Easy, huh? but it doesn’t really do anything for us. Say we wanted to select Animals->Fish->Marine->all, how would we do that?

 SELECT * FROM animal_tree where parent_id=0 (selects Animals)
 SELECT * FROM animal_tree where parent_id=1 (selects Fish)
 SELECT * FROM animal_tree where parent_id=2 (selects Marine)
 SELECT * FROM animal_tree where parent_id=4 (selects all who's parent is "Marine")

In order to make this a bit more automated, since that’s really what we’re trying to do, we need to make use of a recursive function. What is that? you may ask… Well, it’s essentially a function that, at some point, calls itself from within. Here’s a quick example:

function recursive_add($number) {
    $number++;
    echo $number.", ";
    if($number==100) {
        die("All done!");
    }
    else {
        recursive_add($number);
    }
}

recursive_add(0);
//will print 1, 2, 3, 4, 5, ..., 100

This function keeps calling itself until $number equals 100, at which point it kills the script, telling us we’re all done. If we’re going to create a recursive function for our tree, we’re going to need a few modifications to our code first…

function build_tree($parent_id=0,$depth=0) {
    //run our query. by default starts at top of 
    //the tree (with zero depth)
    $result=mysql_query("SELECT * FROM animal_tree "
                                   ."WHERE parent=$parent_id");

    //display each child row
    while($row=mysql_fetch_assoc($result) {
        //indent as necessary and print name
        print str_repeat('....',$level).$row['name']."n";

        //we want to see all the children that belong to this 
        //node... so we need to call *this* function again
       build_tree($row['id'],$level++);
    }
}

That wasn’t too painless, now was it? If this still seems a little to difficult or abstract, don’t worry. Recursive function seem exactly that when you first learn them.

If you ever wanted to find the path to “Halibut,” you can create a similar function that find the parent_id of “halibut”, run the function to find the parent’s parent, so on and so forth.

…enjoy.

Digital Fish Library Press Release

Today is the big day – our press release is now out, and even featured on the SIO (Scripps Institution of Oceanography) website. In just under an hour we’ll begin scanning a juvenile great white shark as part of the press “event.”

If you haven’t read my other posts on the DFL, here’s a brief recap: We’ve been given an NSF grant to digitize most genera of marine fish (some freshwater) in the world using magnetic resonance imaging (MRI). The end goal of this has several application points: 1) Fish anatomy is digitized, so researchers can perform virtual dissections and explore the fish’s organs without physically destroying the real specimens; 2) These data, along with software tools we’re creating, will be used as teaching tools in high school education programs, provided by the Birch Aquarium as Scripps, 3) the availability of data will encourage conservation practices, as destroying real tissues will become less and less a necessity to perform common anatomical research – something especially important when dealing with rare specimens.

The press release is at http://scrippsnews.ucsd.edu/article_detail.cfm?article_num=718

More importantly, the DFL website is located at http://www.digitalfishlibrary.org/

Enjoy…

Zend Framework Quick Review

First off, I’ve been having some problems getting things up and running due to a few errors in the code. Tried some of the fix suggestions by Andi Gutmans here, but to no avail. I also didn’t try too hard to correct these problems. After all, I was more interested in seeing how the code works than I was actually seeing work.

Want to do a Flickr search? Five lines of code!

require_once 'Zend/Services/Flickr.php';
$flickr=new Zend_Service_Flickr('api_key_here');
$photos="$flickr->tagSearch('tag');
foreach($photos as $photo){
echo'Thumbnail->uri .'" />
';
echo $photo->title."
n";

To import an RSS feed:

require_once('Zennd/Feed.php');
$feed = Zend_Feed::import('http://news.google.com/?output=rss');
foreach($feed->items as $item) {
echo "

".$item->title()."
";
echo $item->link() . "

";
}

So, by these code examples alone there’s a lot of promise for this framework, especially considering the amount of development overhead involved in coding similar functionality in your own applications. From the code I’ve seen these demos I’ve copied here are fairly representative of how the Framework is used. At the moment its much too early to begin using the Framework in a production environment, but keep an eye on its development, as it will soon be one of THE tools for rapid PHP app development.

One final note: Check-out the webcast I linked to in a previous post for more details. No point in me duplicating the content here. Check back in the future for further reviews as this Framework develops.

Peace.

Newsvine – out of beta

Well folks, Newsvine is finally out of private beta!

It’s a great spin on aggregating news over the net. First, it claims to put wire news up faster than any other site because it skips all the editorial processes that most news websites (think CNN) go through before anything makes a page. Thousands of articles are instantly available from the AP, ESPN, and others.

Second, users have the chance to seed articles. In other words, you can point other readers to news elsewhere on the net that you find particularly interesting or worthwhile.

Third, it allows users to submit their own articles – much like a type of blog. But it’s more than just a blog. Consider it a place to write news articles and editorial pieces for others to read. Gain a following. Earn money through advertising click-throughs.

One thing that really makes this site stand-out is the amount of news on it… and it’s well-organized. Really well organized. Users also have the opportunity to vote for articles, pushing them up the page. In other words, the more popular content gets more exposure.

Check it out! Newsvine

Hey there! Come check out all-new content at my new mistercameron.com!