Headline

Employ a document-oriented database

Motivation

This implementation demonstrates the basic features of a Document-oriented database by example of Technology:MongoDB. Entries do not need to adhere to a rigid tuple shape like in a relational database. In MongoDB, entries are contained in Language:BSON objects which are organized in collections. The implementation is done in Language:JavaScript which is MongoDB's primary query language and shows how to use of the MapReduce pattern to process contained data inside the database. Typical patterns for a data model most beneficial for a document-oriented DB are also explained.

Illustration

One MongoDB instance can handle many databases, in which collections of documents reside. In this implementation the 101companies Feature:Hierarchical company is mapped onto the MongoDB server as follows:

  • Each company has its own database.
  • Per database there is:
    • One collection for employees
    • one collection for departments
Each employee refers to her department by its ObjectID, a unique identifier assigned by MongoDB to each entry. This is also employed to have departments refer to their super-departments in the tree, as well as to their managers. For top-level departments, this reference is simply omitted.

An example employee entry looks like this:

{
	" id" : ObjectId("4fd1ef6716fc9783d9e779f0"),
	"address" : {
		"city" : "Koblenz",
		"country" : "Germany"
	},
	"dept" : ObjectId("4fd1ea2942afa58847224864"),
	"name" : "Ralf",
	"salary" : 1234
}

Each department carries information about its direct parent under the key "super" and a list of all its ancestor department Object IDs under "ancestors". Note that both the "ancestors" and "super" field are omitted in top-level departments as those have neither and there is now schema forcing us to keep these entries.

An example Department looks like this:

{
	" id" : ObjectId("4fd2061816fc9783d9e779f6"),
	"manager" : ObjectId("4fd2073116fc9783d9e779f7"),
	"super" : ObjectId("4fd1f7b616fc9783d9e779f3"),
	"name" : "Dev1.1",
	"ancestors" : [
		ObjectId("4fd1f7b616fc9783d9e779f3"),
		ObjectId("4fd1f78316fc9783d9e779f2")
	]
}

The reasoning behind the collection layout is as follows: One could easily use the capabilities of MongoDB to save a company as one document, like this:

{
        "company" : "meganalysis"
        "depts" : [
                        {
                                "name" : "Research",
                                "manager" : {...},
                                "employees" : [...]
                        },
                        {
                                "name" : "Development",
                                "manager" : {...},
                                "employees" : [...],
                                "subdepts" : [
                                                {
                                                    "name" : "Dev1",
                                                    ...
                                                }
                                             ]}]}

This has two major disadvantages: First, it is difficult to search through this structure and secondly MongoDB places a 16 MB limit on single documents. If a company grows large, it might hit this limit. The second problem might apply even if we change "subdepts" and "employees" fields to reference to different documents as the lists might still grow indefinitely. By adding parent information to the children instead, we work around this limitation. For departments, references to all ancestors are also saved as this enables us to build the department hierarchy tree-structure more easily.

Total is implemented using MongoDBs MapReduce capabilities. The map function will emit one entry per employee, all under the same key. The reduce function will just sum up all these entries and return the results. Note: Normally, the result of a MapReduce computation will be written into a new collection inside MongoDB, due to the small size of the example, the server is advised to simply return the computation as a JSON object.

Cut is implemented by simply iterating over the employees collection and changing the salary field for each document inside.

In addition to this, MapReduce code to create all subtrees to a company is provided. This is achieved as follows:

The map function essentially takes the list of ancestors for each department and emits pairs of {ancestor, dept} for each of the department's ancestors. The reduce function then gathers all pairs who are keyed by the same ancestors, yielding all subtrees contained in the hierarchy. In a second step, using a few simple queries on the employee database, all employees that are associated with each subtrees are added.

Usage

In the json subfolder are two collection dumps created via the mongoexport tool. To import those, make sure you have a MongoDB server running on your local machine and simply run the packaged script:

$ ./rebuild.sh

This will reimport the employee and dept collections for the meganalysis company into the local MongoDB instance.

Feature demonstrations are run using the mongo command by simply specifying the correct database and the corresponding JavaScript files as command line arguments like this:

$ mongo meganalysis total.js

Each script will print out the results of the respective computation.


There are no revisions for this page.

User contributions

    This user never has never made submissions.

    User edits

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.