The implementation is tricky because pyodbc assumes that, as would be the case with most SQL servers, every value returned for a particular column is of the same type. This is not the case for literal values returned from a SPARQL query.
So basically what this does is:
- At connection time, check if we are connecting to a Virtuoso server
- At query execution time, check if the query starts with "SPARQL"
- For each column of each row, execute a bunch of Virtuoso-specific ODBC calls to find out the datatype and value.
We found further that Virtuoso was only reporting the real datatype if the query also contained define output:valmode "LONG" which returns the internal representation (IRI) of URI and blank node identifiers rather than their string representation. This means that a further SELECT is necessary to retrieve the string representation -- this is left to application code to do.
When processing a SPASQL query, the returned values are tuples of the form:
(value, dvtype, dttype, flags, lang, datatype)
- value is the IRI for a URI or blank node and the string value for a literal
- dvtype is the Virtuoso-specific data value type, this can be used to distinguish between IRI and strings, integers, floats, dates, etc.
- dttype is the datetime, date or time flag for date-like types.
- flags is any flags on the column
- lang is the language if present for a literal
- datatype is the datatype if present for a literal
It is up to the higher level code to take this tuple and transform it into whatever representation is appropriate (e.g. and rdflib.term.Node instance). This is different from straight SQL queries which should return the correct python datatype directly.
Debian package diff, apply in addition to the above patch as well as packages:
To build on a different debian-like distribution, just download and uncompress the patched source then run:
make -k -f debian/rules binary
and you should have .deb packages one directory higher.