Optimal xml storage engine

Go To StackoverFlow.com

3

I'm considering optimal open source solution for storing xml documents with further querying on them effectively. Amount of data will be small. As far as I understand native xml databases might form a proper solution for my case. They obviously store xml documents in highly efficient way. It would be great to learn your experience. Any suggestions on proper solution? Have you got any experience employing xml storage engines in your apps?

2009-06-16 08:44
by nixau
What sort of data volumes are we talking? MB? GB? TB - skaffman 2009-06-16 08:59
Another option that is of high importance for me: these xml storages must be able to run in memory, without storing data on disk - nixau 2009-06-16 09:05
then eXist is the only one I think that can do this (and only in Java). Oracle XML DB is an embedded solution too, but I think it stores data on disk instead of memory - SztupY 2009-06-16 09:12
Thanx for good and helpful suggestions - nixau 2009-06-16 09:31


2

We've benn working with native XML servers at our work. They're fine if your data is below like some 100-200MB-s, but after that I couldn't find a proper server that could handle the data. I've tried the following:

  • eXist: Java based native XML server (open source): with large files it usually eats the JVM's memory and then throws an out of memory exception
  • sedna: C based native XML server (open source): Can handle databases real large, but segfaults if you try to query a non-indexed data from the database
  • Tamino XML database (proprietary): One of the first XML databases, mature but has crappy xquery support (at least the almost latest version we're using has very bad suport for XQuery), and as we've seen even a trained professional couldn't set it up properly to be fast enough.

Here are my suggestions:

  • For small data and for Java based systems try eXist. It even has an SQlite-like file based database support that might be useful.
  • For small to medium data, where performance matters use sedna. According to my test it's the fastest from the three.
  • If you need support then use Tamino. It has at least support.

For large databases (1GB and up) I wouldn't recommend any of them (yet). eXist usually crashed with a 200MB sized database. sedna crashed with an 1GB sized database when querying something that's not indexed, and tamino couldn't even load 500MB's of data in one run before crashing the whole system. Of course all of these systems are evolving, so maybe a bit later they'll be safe to use, but native XML databases are still unfortunately immature.

2009-06-16 08:58
by SztupY
Yep, I won't store large ammount of data, so I suggest eXist should be taken under consideration. And what about querying performance, fetching of data on demand - nixau 2009-06-16 09:03
Comparing to relational databases like SQlite, they're not that performant. Sedna was the fastes from the three, but it was still slow compared to what a relational database could do. So it depends on the data you want to store. If the schema of the database is "stable" I'd convert them to a relational model. If the schema of your database is something you can't represent easily in a relational databases then go for eXist or sedna. They might be a bit slower, but XQuery is in many ways superior to SQL if you have to query some complex expressions.(in these cases they might be a bit faster too - SztupY 2009-06-16 09:09


1

Have you looked into major vendor supplied solutions such as Oracle XML DB. I've not tried it but it would certainly be worth evaluating providing you have a budget for such things!

Also Wikipedia have a nice list of XML Dbs, which you might wish to evaluate.

2009-06-16 09:04
by RichardOD