開始整理一些使用owl或是rdf儲存的資料時,直接使用像是jena arq來做sparql,會發現效率非常的差,所以假如真的要將一套基於rdf/owl資料使用sparql查詢的應用華,必定要思考找尋一個好的database來專門解決效率的問題。
畢竟rdf標準發展也有10幾年了,其實與其相關連的資料庫其實比想像中的多,但要如何挑選適合的便是件頭痛的事情,從下面這張表可以一窺rdf相關資料庫的發展和相關的database system的演進。
Apache’s Jena TDB system is stated as being faster, more scalable, and better supported than the Jena SDB, which is a non-native system relying on an RDBMS. TDB is, for instance, the system supporting persistence in the Fuseki SPARQL server. The architecture is built around three concepts, namely a node table, triples/quads indexes, and a prefixes table. The node table serves to store the dictionary and follows the two mappings approach presented in Chapter 4. Practically, the string-to-id and id-to-string operations are respectively implemented using B+trees and a sequential file. A large cache is dedicated to ensure fast data retrieval during query processing. Triples and quads indexes are stored in specialized structures and respectively store three and four identifiers from the node table. B+trees are used to persist these indexes. The system supports SPARQL update operations, which are handled using ACID transactions (with the serializable isolation level) through a WAL (write ahead logging) approach. This implies that write transactions are first written into a journal and then stored in the database when resources permit it. This approach presents the benefit of not requiring a locking solution for read transactions. Finally, Jena TDB supports a bulk-load solution that does not support transactions. The different features contained in Jena TDB, such as some security aspects as well as some APIs, make it a solution to consider in a production setting.
閱讀參考
1. Olivier Curé; Guillaume Blin. RDF Database Systems. (2014). Morgan Kaufmann