Nested Set Trees

Introduction

A common technique to manage hierarchical data in a relational database is to use an Adjacency List model, where you have both an ID column and a Parent ID column in a table. This is easy to understand and maintain but can be difficult or inefficient when you want to retrieve hierarchies of records. The Nested Set model provides an alternative technique for managing this kind of data and is more efficient at reading a hierarchy but requires more work for inserts, updates, deletes and moves. NSTree is a library of ColdFusion code that you may like to use to help manage your hierarchical data using the nested set model.

The current version of the library is 0.8. Please leave any comments on this version here: http://stannard.net.au/blog/index.cfm/2008/8/25/Nested-Set-Trees-in-ColdFusion-v08.

Prerequisites

Before using the library you will need to have a good understanding of the Nested Set data model. See the references below for some good links that describe the Nested Set model in more detail, or just try a search in Google.

About the Code

The functions provided and underlying algorithms are based on a PHP/MySQL implementation by Rolf Brugger (thanks Rolf!) and the Nested Set theory was developed by Joe Celko. This code is also based around creating an additional "partner" table that is used with an existing data table. The library should work in ColdFusion 6.1 and above and any common database but I have only tested in MS SQL Server.

Download the Code

You can download this code from RIAForge

http://nstree.riaforge.org

This download file contains the three main Nested Set Tree files in the /nstree/com/nstree directory:

  • NestedSetTreeTable.cfc - Represents a database table used for storing nested set tree data.
  • NestedSetTree.cfc - Library of functions to manipulate a nested set tree.
  • Node.cfc - Represents a node in the tree that also has functions to manipulate the tree.

These are the only files you need when using the library in your own application.

Installing the demo

The following steps should get the demo working for you:

  1. Download the NSTree code from http://nstree.riaforge.org
  2. Unzip the file and place the nstree directory somethere in your web path (the demo will run in a subdirectory or in a root directory)
  3. Create a new database that you can use for this demo (or you can use an existing database)
  4. Create a ColdFusion datasource that points to your database. The demo has a default datasource called nstree configured.
  5. Run the nstree.mssql.sql script (from the root of the demo nstree folder) on your database. This will create two tables named category and categoryNST.
  6. Open your web browser and browse to the nstree directory.

Sorry, I only have an Microsoft SQL version of the script available at the moment, but see the following section for details of the tables that need to be created.

Understanding the Nested Set Tree tables

This library is based around the concept of having a primary data table and a partner Nested Set Tree (NST) table. To better understand this let's describe the demo category tables:

Primary table: category

Column Name Column Type Description
categoryId int, PK Unique primary key
parentCategoryId int Reference to parent category record
categoryName varchar(50) Category name

Note that the parentCategoryId column is not required to use the NST library, but can be useful to maintain.

For each record in the primary category table you will need to maintain a partner record in a NST table.

Partner table: categoryNST

Column Name Column Type Description
treeId int, PK Unique id for a complete nested set tree. Part of compound primary key.
id int, PK Unique id of a node within a nested set tree. Part of compound primary key. This column is the foreign key to the primary category table.
lft int Left index
rgt int Right index

Notice that we are using a compound primary key on the treeId and id columns.

The demo script also creates an index on the treeId, id and lft columns. This will significantly improve the performance of most reads to the NST table.

About Locking and Exceptions

The library is designed to be multi-user safe but you will still need to consider locking in your own code that manages the primary data table. The examples that follow will show how you might implement this.

The library has a focus on robustness over performance and additional queries are performed to ensure that operations are valid at the time they are run, such as checking a parent node exists before attempting to append children to it. Exceptions will be thrown if these preconditions are not met.

Thanks to Adam Cameron for raising this issue.

The library provides four lock related functions that you may use:

setLockName(name) Sets the name of the lock.
getLockName() Returns the name of the lock.
setLockTimeout(seconds) Sets the lock timeout.
getLockTimeout() Returns the current lock timeout.

Functions that read data use a READONLY lock. Functions that change data use an EXCLUSIVE lock.

In your code, you can use locking as follows.

<cflock
   name="#variables.nst.getLockName()#"
   timeout="#variables.nst.getLockTimeout()#"
   type="exclusive">

   <!--- Your code plus calls to the library here --->
</cflock>

The default lock name is "NESTEDSETTREE-NSTTableName-NSTTreeId".

About Transactions

No transactions are implemented within the library. You should implement transactions in the code that uses the library.

Nested Set Tree Node Orientation

The node orientation within the tree uses the following terminology. The directions are from the perspective of 'Node':

The NST library objects

There are three NST library objects:

  • NestedSetTreeTable
  • NestedSetTree
  • Node

NestedSetTreeTable

The first object you need to create is the NestedSetTreeTable object. You must create one for each table that will hold nested set data. These may be stored in the application scope.

The object requires the following parameters when created:

Parameter Requirement Description
datasourceName Mandatory The datasource name.
datasourceUsername Optional The datasource user name. Default is empty string.
datasourcePassword Optional The datasource password. Default is empty string.
table Mandatory The name of the NST table.
treeId Optional The name of the NST table Tree Id column. Default value is "treeId".
id Optional The name of the NST table Id column. Default value is "id".
left Optional The name of the NST table left column. Default value is "lft".
right Optional The name of the NST table right column. Default value is "rgt".

Example of creating the nested set tree table object:

<cfset variables._nstTable = createObject("component","NestedSetTreeTable").init(
               "nstree",
               "nstreeuser",
               "nstreepass"
               "categoryNST",
               "id",
               "lft",
               "rgt"
            )>

Or you can use the simpler version if you use the default column names:

<cfset variables._nstTable = createObject("component","NestedSetTreeTable").init(
               datasourceName="nstree",
               tableName="categoryNST"
            )>

The NestedSetTreeTable has the following functions available:

Function Description
getNestedSetTree(treeId) Returns a NestedSetTree object with the specified Tree Id. The treeId parameter is optional and defaults to the value 0. You can use this value if you do not plan to use multiple trees within your table.
deleteAllTrees() Deletes all trees from the database.

NestedSetTree

The main object you need to use is the NestedSetTree. This object contains all of the functions you need to manipulate you tree. You create NestedSetTree object from the NestedSetTreeTable:

<cfset variables._nst1 = variables._nstTable.getNestedSetTree(1)>
<cfset variables._nst2 = variables._nstTable.getNestedSetTree(2)>

Or if your table is only expected to contain one tree:

<cfset variables.nst = variables._nstTable.getNestedSetTree()>

The following examples will use a simple private function nst() as an alias to the NestedSetTree object:

<cffunction name="nst" output="false" access="private">
   <cfreturn variables._nst>
</cffunction>

Node

The Node object is created whenever you read a record from the NestedSetTree. For example:

<cfset node = nst().getNode(10)>

A Node has the following functions available:

Function Description
getTree() Returns the NestedSetTree associated with the node.
getTreeId() Returns the Tree Id of the node.
getId() Returns Id of the node.
getLeft() Returns Left value of the node.
getRight() Returns Right value of the node.
setTree(tree) Sets the NestedSetTree of the node.
setId(id) Sets the Id of the node.
setLeft(left) Sets the Left value of the node.
setRight(right) Sets the Right value of the node.
reload() Updates the nodes left and right values from the database.
invalidate() Sets a node's left and right values to 0 which represent an invalid node.
toStruct() Returns a struct value containing the treeId, id, left and right values.
toStringValue() Returns a string representation of the node.

Creating the root node of a new tree

Nested set trees need to have a top level root node which is the parent of all other nodes. When you have no nodes in your tree, then you can use the following code to create your first record.

If you are converting an existing data table to use the NST model, then see "Creating a NST for existing data" below.

To create a new root node in the NST, use the NestedSetTree newRoot() function

Function Description
newRoot(id) Creates a new root node and returns a Node object.

Example code:

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>

<!--- Create the category record. --->
<cfquery result="q" ... >
   insert into category
   (
      parentCategoryId,
      categoryName
   )
   values
   (
      <cfqueryparam cfsqltype="cf_sql_integer" value="#parentCategoryId#">,
      <cfqueryparam cfsqltype="cf_sql_varchar" value="#categoryName#">
   )
</cfquery>
<cfset categoryId = q.IDENTITYCOL>

<!--- Update the NST. --->
nst().newRoot(categoryId)>

</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Retrieving nodes from the tree

The NestedSetTree object provides the following functions to retrieve nodes.

NestedSetTree Function Description
root() Returns the root node.
getNode(id) Returns a node with the specified id
firstChild(node) Returns the first child of the specified node.
lastChild(node) Returns the last child of the specified node.
prevSibling(node) Returns the previous sibling of the specified node.
nextSibling(node) Returns the next sibling of the specified node.
ancestor(node) Returns the parent of the specified node.

The Node object has the following functions to retrieve nodes.

Node Function Description
firstChild() Returns the node's first child.
lastChild() Returns the node's last child.
prevSibling() Returns the node's previous sibling.
nextSibling() Returns the node's next sibling.
ancestor() Returns the node's parent.

These functions always return a node. When you have the result node, you should test if the node is valid.

Testing for a valid node

Use isValidNode() to test if a node is valid. An invalid node is essentially equivalent to receiving a null or undefined value.

On a NestedSetTree use:

NestedSetTree Function Description
isValidNode(node) Returns true if the specified node is valid.

On a Node use:

Node Function Description
isValidNode() Returns true if the node is valid.

Querying for nodes in the tree

The following functions return a true or false value.

For the NestedSetTree:

NestedSetTree Function Description
hasAncestor(node) Returns true if the node has a parent.
hasPrevSibling(node) Returns true if the node has a previous sibling
hasNextSibling(node) Returns true if the node has a next sibling.
hasChildren(node) Returns true if the node has children.
isRoot(node) Returns true if the node is a root node.
isLeaf(node) Returns true if the node is a leaf node.
isChild(node1,node2) Returns true, if 'node1' is a direct child or in the subtree of 'node2'
isChildOrEqual(node1,node2) Returns true, if 'node1' is a direct child or in the subtree of 'node2' or is the same node as 'node2'
isEqual(node1,node2) Returns true if 'node1' and 'node2' are the same node.

For the Node use:

Node Function Description
hasAncestor() Returns true if the node has a parent.
hasPrevSibling() Returns true if the node has a previous sibling
hasNextSibling() Returns true if the node has a next sibling.
hasChildren() Returns true if the node has children.
isRoot() Returns true if the node is a root node.
isLeaf() Returns true if the node is a leaf node.
isChild(node) Returns true, if the node is a direct child or in the subtree of 'node'
isChildOrEqual(node) Returns true, if the node is a direct child or in the subtree of 'node' or is the same node as 'node'
isEqual(node) Returns true if the node is the same as the 'node' parameter.

Adding new nodes

The following functions are provided to add nodes to a tree.

NestedSetTree Function Description
newFirstChild(node,id) Creates a new node before all other children.
newLastChild(node,id) Creates a new node after all other children.
newPrevSibling(node,id) Creates a new node before the current node.
newNextSibling(node,id) Creates a new node after the current node.

For Node objects use:

Node Function Description
newFirstChild(id) Creates a new node before all other children.
newLastChild(id) Creates a new node after all other children.
newPrevSibling(id) Creates a new node before the current node.
newNextSibling(id) Creates a new node after the current node.

Example code to add new children

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>

<!--- Create the category record. --->
<cfquery result="q" ... >
   insert into category
   (
      parentCategoryId,
      categoryName
   )
   values
   (
      <cfqueryparam cfsqltype="cf_sql_integer" value="#parentCategoryId#">,
      <cfqueryparam cfsqltype="cf_sql_varchar" value="#categoryName#">
   )
</cfquery>
<cfset categoryId = q.IDENTITYCOL>

<!--- Update the NST. --->
<cfset nst().newLastChild(parentCategoryId, categoryId)>

</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Deleting nodes

The NestedSetTree object provides the following delete functions:

NestedSetTree Function Description
delete(node) Deletes the specified node.
deleteTree() Deletes all nodes in the tree.

The Node object provides just a single function:

Node Function Description
delete() Deletes the node.

Example code to delete a node:

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>

<!--- Get the node to delete --->
<cfset node = nst().getNode(categoryId)>

<!--- Delete the category and all children. --->
<cfquery ... >
   delete cat
   from
      category cat
      inner join categoryNST nst on cat.categoryId = nst.categoryId
   where
      nst.lft >= <cfqueryparam cfsqltype="cf_sql_integer" value="#node.getLeft()#">
      and
      nst.rgt <= <cfqueryparam cfsqltype="cf_sql_integer" value="#node.getRight()#">
</cfquery>

<!--- Then delete the NST record. --->
<cfset nst().delete(categoryId)>

</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Moving nodes

The following functions are available to move nodes within your tree:

NestedSetTree Function Description
moveToNextSibling(src,dst) Moves the node 'src' and all its children (subtree) so that it is the next sibling of 'dst'.
moveToPrevSibling(src,dst) Moves the node 'src' and all its children (subtree) so that it is the prev sibling of 'dst'.
moveToFirstChild(src,dst) Moves the node 'src' and all its children (subtree) so that it is the first child of 'dst'.
moveToLastChild(src,dst) Moves the node 'src' and all its children (subtree) so that it is the last child of 'dst'.

And for the Node:

Node Function Description
moveToNextSibling(dst) Moves the node and all its children (subtree) so that it is the next sibling of 'dst'.
moveToPrevSibling(dst) Moves the node and all its children (subtree) so that it is the prev sibling of 'dst'.
moveToFirstChild(dst) Moves the node and all its children (subtree) so that it is the first child of 'dst'.
moveToLastChild(dst) Moves the node and all its children (subtree) so that it is the last child of 'dst'.

Note that in these examples, the sort order of the nodes is managed by the NST rather than the primary data table (via a 'sort' column) so we will only need to update the NST table.

Moving a node 'up'

Example code to move a node up/left within the same parent.

<cfset var prevNode = 0>
<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>
   <cfset prevNode = nst().prevSibling(categoryId)>
   <cfset nst().moveToPrevSibling(categoryId,prevNode.getId())>
</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Moving a node 'down'

Example code to move a node down/right within the same parent.

<cfset var nextNode = 0>
<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>
   <cfset nextNode = nst().nextSibling(categoryId)>
   <cfset nst().moveToNextSibling(categoryId,nextNode.id)>
</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Moving a node to another branch in the tree

Note that nodes may only be moved to either parent or sibling nodes, so we need to ensure the move is permitted.

<!--- If the destination node is a child of the source node, then the move is not permitted. --->
<cfset var moveAllowed = false>
<cftry>
<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">
   <cfset moveAllowed = not nst().isChildOrEqual(dstId,srcId)>
</cflock>
<cfcatch>
<!--- Handle errors (such as the src or dst nodes not existing --->
</cfcatch>
</cftry>

<cfif not moveAllowed>
   <cfreturn>
</cfif>

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="exclusive">
<cftry>
<cftransaction>

<!--- Change the node's parent in the primary table. --->
<cfquery ...>
   update category
   set parentCategoryId = <cfqueryparam cfsqltype="cf_sql_integer" value="#dstId#">
   where categoryId = <cfqueryparam cfsqltype="cf_sql_integer" value="#srcId#">
</cfquery>

<!--- Move the node in the NST. --->
<cfset nst().moveToLastChild(srcId,dstId)>

</cftransaction>
<cfcatch>
<!--- Handle error here --->
</cfcatch>
</cftry>
</cflock>

Querying the data

The library provides some functions to query the NST table:

NestedSetTree Function Description
numChildren(node) Returns a count of all children in the subtree of the node.
level(node) Returns the node depth level, with the root level starting at 0.
getSubtree(node) Returns the complete subtree of the specified node.
walkPreorder() Returns the complete tree hierarchy.

And for the Node:

Node Function Description
numChildren() Returns a count of all children in the subtree of the node.
level() Returns the node depth level, with the root level starting at 0.
getSubtree() Returns the complete subtree of the specified node.

However, when querying the data you mostly want to also obtain data from your primary table:

Getting the complete tree hierarchy

The 'left' column in the NST provides all we need to get a complete hierarchy of the tree.

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">
<cfquery name="q" ...>
   select
      c.*,
      nst.lft,
      nst.rgt
   from
      category c
      inner join categoryNST nst on c.categoryId = nst.id
   order by
      nst.lft
</cfquery>
</cflock>

It is often useful to get the depth level of the nodes at the same time.

Getting the level of a node

First let's look at getting the depth of a single node.

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">

<!--- Get the node. --->
<cfset node = nst().getNode(categoryId)>

<!--- Get the level --->
<cfquery name="q" ...>
   select count(*) as level
   from categoryNST
   where
      lft < #node.getLeft()#
      and rgt > #node.getRight()#
</cfquery>

</cflock>

<cfreturn q.level>

Getting the complete tree hierarchy with depth levels

Now we can combine the complete tree hierarchy and level queries together. The level query is included as a subquery.

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">
<cfquery name="q" ...>
   select
      c.*,
      (
         select count(*)
         from categoryNST c2
         where
            c2.lft < nst.lft
            and c2.rgt > nst.rgt
      ) as level,
      nst.lft,
      nst.rgt
   from
      category c
      inner join categoryNST nst on c.categoryId = nst.id
   order by
      nst.lft
</cfquery>
</cflock>

Getting a subtree with depth levels

If you only need to get a subtree (rather than the complete tree) with levels, then you may use code such as:

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">
<cfset node = nst().getNode(categoryId)>
<cfquery name="q" ...>
   select
      c.*,
      (
         select count(*)
         from categoryNST c2
         where
            c2.lft < nst.lft
            and c2.rgt > nst.rgt
      ) as level,
      nst.lft,
      nst.rgt
   from
      category c
      inner join categoryNST nst on c.categoryId = nst.id
   where
      nst.lft >= #node.getLeft()#
      and nst.rgt <= #node.getRight()#
   order by
      nst.lft
</cfquery>
</cflock>

Getting a breadcrumb record set

Use the following to get a query of the breadcrumb links for a node.

<cflock   name="#nst().getLockName()#" timeout="#nst().getLockTimeout()#" type="readonly">
<cfquery name="q" ...>
   select
      c.*,
      nstB.lft,
      nstB.rgt
   from
      categoryNST nstA
      cross join categoryNST nstB
      inner join category c on nstB.id = c.categoryId
   where
      nstA.lft between nstB.lft and nstB.rgt
      and nstA.id = #categoryId#
   order by
      nstB.lft
</cfquery>
</cflock>

Rendering nested set output as an unordered list

You can display a nested set query's output as nested unordered lists using the following code.

First you would need to execute a query that returned either the entire tree or a subtree plus corresponding depth levels (as described above).

In this example, the "treeQuery" query contains a "level" column and a "categoryName" column.

<cfset prevLevel = -1>
<cfset currLevel = -1>
<cfloop query="treeQuery">
   <cfset currLevel = level>
   <cfif currLevel gt prevLevel>
      <ul><li>#categoryName#
   <cfelseif currLevel lt prevLevel>
      <cfset tmp = prevLevel>
      <cfloop condition="tmp gt currLevel">

         </li></ul>
         <cfset tmp = tmp - 1>
      </cfloop>
      </li><li>#categoryName#
   <cfelse>
      </li><li>#categoryName#
   </cfif>
   <cfset prevLevel = level>
</cfloop>
<cfset tmp = currLevel>
<cfloop condition="tmp ge 0">
   </li></ul>
   <cfset tmp = tmp - 1>
</cfloop>

Creating a NST for existing data

If you have existing data that you want to convert to use the NST model, then the library provides a rebuildIndex() function to help you with this conversion:

This function has four parameters:

Parameter Description
primaryTable The name of your primary table
primaryTableId The name of your primary table's primary key column (integer only)
primaryTableParentId The name of your primary table's parent id column (integer only)
rootId The id of the root record in your primary table. This is the start record for the conversion.

When run, all records in the NST table will be deleted and recreated.

Example usage:

<cfset rootNodeId = 1>
<cfset nst().rebuildIndex("category","categoryId","parentCategoryId",rootNodeId)>

Reindex Utility

The demo application also comes with a simple utility for reindexing a table.

Browse to the demo site and click on the Reindex Existing Table link. Here you will see a form where you can specify datasource details, nested set tree table details and source data table details.

References

Nested Set Trees, a PHP/mysql implementation
http://www.edutech.ch/contribution/nstrees/index.php

Managing Hierarchical Data in MySQL
http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

Storing Hierarchical Data in a Database
http://www.sitepoint.com/article/hierarchical-data-database