473897be3a2e — Chris Cannam 9 years ago
* Docs and tidying
2 files changed, 138 insertions(+), 209 deletions(-)

M objectmapper/ObjectLoader.cpp
M objectmapper/ObjectLoader.h
M objectmapper/ObjectLoader.cpp +47 -206
@@ 49,92 49,56 @@ 
 
 namespace Dataquay {
 
-/*
- * The various sets and maps that get passed around are:
- *
- * NodeObjectMap &map -- keeps track of node-object mappings for nodes
- * that are being loaded, and also for any other nodes that the
- * calling code is interested in (this map belongs to it, we just
- * update it).  We can refer to this to find out the object for a
- * given node, but it's reliable only when we know that we have just
- * loaded that node ourselves -- otherwise the value may be stale,
- * left over from previous work.  (However, we will still use it if we
- * need that node for an object property and FollowObjectProperties is
- * not set so we aren't loading afresh.)
- *
- * NodeSet &examined -- the set of all nodes we have seen and loaded
- * (or started to load) so far.  Used solely to avoid infinite
- * recursions.
- *
- * What we need:
- *
- * NodeObjectMap as above.
- *
- * Something to list which nodes we need to load. This is populated
- * from the Node or Nodes passed in, and then we traverse the tree
- * using the follow-properties adding nodes as appropriate.
- *
- * Something to list which nodes we have tried to load.  This is our
- * current examined set.  Or do we just remove from the to-load list?
- * 
- * Workflow: something like --
- * 
- * - Receive list A of the nodes the customer wants
- *
- * - Receive node-object map B from customer for updating (and for our
- *   reference, as soon as we know that a particular node has been
- *   updated)
- *
- * - Construct set C of nodes to load, initially empty
- *
- * - Traverse list A; for each node:
- *   - if node does not exist in store, set null in B and continue
- *   - add node to set C
- *   - push parent on end of A if FollowParent and parent property
- *     is present and parent is not in C
- *   - push prior sibling on end of A if FollowSiblings and follows 
- *     property is present and sibling is not in C
- *   - likewise for children
- *   - likewise for each property node if FollowObjectProperties
- *     and node is not in C
- *
- * Now we really need a version D of set C (or list A) which is in
- * tree traversal order -- roots first...
- *
- * - Traverse D; for each node:
- *   - load node, recursing to parent and siblings if necessary,
- *     but do not set any properties
- *   - put node value in B
- *
- * - Traverse D; for each node:
- *   - load properties, recursing if appropriate
- *   - call load callbacks
- * 
- * But we still need the examined set to avoid repeating ourselves
- * where an object is actually un-loadable?
- */
 
 /*
+ * Generally we pass around a LoadState object recording current
+ * progress and work still to do.  See LoadState for the terminology
+ * used.
+ *
+ * We have five phases:
+ *
+ * 1. Collect -- given the requested node set, fill the toAllocate,
+ * toInitialise, and toPopulate sets based on the FollowPolicy.
+ *
+ * 2. Allocate -- construct new nodes for those in toAllocate that are
+ * not yet in the map.  Recurse as appropriate to parent, siblings and
+ * children.  (toAllocate contains all the nodes we will need to load,
+ * including those we will recurse to when assigning properties for
+ * FollowObjectProperties policy later.)
+ * 
+ * 3. Initialise -- set the "literal" properties for each node in
+ * toInitialise
+ *
+ * 4. Populate -- set all remaining properties for each node in
+ * toPopulate
+ *
+ * 5. Callbacks -- call any registered load callbacks for each node
+ *
+ *!!! Originally we separated out initialise and populate for the
+ * convenience of users, in order to get basic definitional properties
+ * set before attaching objects as children or properties.  However,
+ * this doesn't really hold in practice because we must attach
+ * children before the initialise phase.  Is there still value in it?
+ * Is there any harm in it?
+ *
+ * Notes
+ *  
  * We should construct a new object when:
  *
- * - a node in desired does not appear in map at all or is null in map
+ * - a node in requested does not appear in map or is null in map
  *
  * - a node called for as parent or object property does not appear in
  *   map and the relevant follows policy is set
  *
- * - ??? a node's parent changes?
- *
  * We should reload properties for an object node when:
  * 
- * - the node is in desired
+ * - the node is in requested
  *
  * - the node has just been loaded
  *
- * - any others??
- *
  * We should delete an object when:
  * 
- * - a node in desired is in the map but not in the store
+ * - a node in requested is in the map but not in the store
  *
  * Cycles are only a problem when loading objects -- not when setting
  * properties.  So we need to ensure that all objects that will need

          
@@ 142,20 106,6 @@ namespace Dataquay {
  * before we start loading, then we load relationship tree (no
  * cycles), then we load property objects, then we go through setting
  * properties on the appropriate objects.
- *
- *!!! latest trouble: We probably want to set properties on an object
- *    before we attach its children. Is this possible here? Set simple
- *    properties before doing child nodes... hm, but that means doing
- *    so in load rather than populate, and the set of nodes being
- *    loaded is smaller than the set being initialised
- *
- * Current terminology: 
- * - requested: nodes the customer asked to be loaded or reloaded
- * - toAllocate: nodes whose objects will need to be constructed (if possible)
- * - toInitialise: nodes pending the first (literal) properties set
- * - toPopulate: nodes pending the full properties set
- * (we separate out initialise and populate for convenience of users,
- * so we get basic properties set before children are attached)
  */
 
 class ObjectLoader::D

          
@@ 167,13 117,23 @@ public:
 
         LoadState() : loadFlags(0) { }
 
+        /// Nodes the customer has explicitly asked to load or reload
         Nodes requested;
+
+        /// Nodes whose objects will need to be constructed if possible
         NodeSet toAllocate;
+
+        /// Nodes pending the first (literal) property assignment
         NodeSet toInitialise;
+
+        /// Nodes pending the full property assignment
         NodeSet toPopulate;
+
+        /// All known node-object correspondences, to be updated as we go
         NodeObjectMap map;
 
         enum LoadFlags {
+            /// Do not throw exception if RDF type unknown (for loadAll etc)
             IgnoreUnknownTypes = 1 << 0,
         };
         unsigned int loadFlags;

          
@@ 581,16 541,7 @@ private:
 
         //!!! too many of these tests, some must be redundant
         if (!state.toAllocate.contains(node)) return;
-/*
-        if (!nodeHasTypeInStore(node)) {
-            DEBUG << "Node " << node << " has no type in store, can't load, setting to 0 in map" << endl;
-            delete state.map.value(node); //!!! what if it had child nodes?
-            state.map.insert(node, 0);
-            state.toAllocate.remove(node);
-            state.toInitialise.remove(node);
-            return;
-        }
-*/
+
         if (m_fp & FollowSiblings) {
             Nodes siblings = orderedSiblingsOf(node);
             foreach (Node s, siblings) {

          
@@ 623,13 574,6 @@ private:
 
         DEBUG << "loadSingle: " << node << " (parent = " << parentObject << ")" << endl;
 
-    //!!! document the implications of this -- e.g. that if
-    //!!! multiple objects have properties with the same node in
-    //!!! the RDF, they will be given the same object instance --
-    //!!! so it is generally a good idea to have "ownership" and
-    //!!! lifetime of these objects managed by something other
-    //!!! than the objects that have the properties
-
         //!!! too many of these tests, some must be redundant
         if (!state.toAllocate.contains(node)) {
             DEBUG << "already loaded: returning existing value (" << state.map.value(node) << ")" << endl;

          
@@ 669,22 613,10 @@ private:
         }
     }
 
-/*
-
-
-    QObject *load(NodeObjectMap &map, NodeSet &examined, const Node &n,
-                  QString classHint = "");
-*/
-
     QString getClassNameForNode(Node node);
 
     QObject *allocateObject(Node node, QObject *parent);
-/*
-    QObject *loadSingle(NodeObjectMap &map, NodeSet &examined, Node node, QObject *parent,
-                        QString classHint, CacheingPropertyObject *po);
 
-    void callLoadCallbacks(NodeObjectMap &map, Node node, QObject *o);
-*/
     void initialise(LoadState &, Node node);
     void populate(LoadState &, Node node);
 

          
@@ 700,66 632,6 @@ private:
     QVariant propertyNodeToVariant(LoadState &, QString typeName, Node pnode);
     QVariantList propertyNodeToList(LoadState &, QString typeName, Node pnode);
 };
-/*
-QObject *
-ObjectLoader::D::load(NodeObjectMap &map, NodeSet &examined, const Node &n,
-                      QString classHint)
-{
-    DEBUG << "load: Examining " << n << endl;
-
-    if (m_fp != FollowNone) {
-        examined.insert(n);
-    }
-
-    CacheingPropertyObject po(m_s, m_tm.getPropertyPrefix().toString(), n);
-
-    if (m_fp & FollowSiblings) {
-        //!!! actually, wouldn't this mean query all siblings and follow all of those? as on store? but do we _ever_ actually want this behaviour?
-        QString followsProp = m_tm.getRelationshipPrefix().toString() + "follows";
-        if (po.hasProperty(followsProp)) {
-            Node fn = po.getPropertyNode(followsProp);
-            //!!! highly inefficient if we are last of many siblings
-            if (!examined.contains(fn)) {
-                try {
-                    load(map, examined, fn);
-                } catch (UnknownTypeException) {
-                    //!!!
-                }
-            }
-        }
-    }
-
-    QObject *parent = 0;
-    QString parentProp = m_tm.getRelationshipPrefix().toString() + "parent";
-    if (po.hasProperty(parentProp)) {
-        Node pn = po.getPropertyNode(parentProp);
-        try {
-            if (m_fp & FollowParent) {
-                if (!examined.contains(pn)) {
-                    DEBUG << "load: FollowParent is set, loading parent of " << n << endl;
-                    load(map, examined, pn);
-                }
-            } else if (map.contains(pn) && map.value(pn) == 0) {
-                DEBUG << "load: Parent of node " << n << " has not been loaded yet, loading it" << endl;
-                load(map, examined, pn);
-            }
-        } catch (UnknownTypeException) {
-            //!!!
-        }
-        parent = map.value(pn);
-    }
-
-    //!!! and FollowChildren... (cf loadTree)
-
-    //!!! NB. as this stands, if the RDF is "wrong" containing the
-    //!!! wrong type for a property of this, we will fail the whole
-    //!!! thing with an UnknownTypeException -- is that the right
-    //!!! thing to do? consider
-
-    QObject *o = loadSingle(map, examined, n, parent, classHint, &po);
-    return o;
-}
-*/
 
 void
 ObjectLoader::D::initialise(LoadState &state, Node node)

          
@@ 1024,9 896,7 @@ ObjectLoader::D::propertyNodeToObject(Lo
     QObject *o = 0;
 
     if (pnode.type == Node::URI || pnode.type == Node::Blank) {
-//!!!        if (state.candidates.contains(pnode)) {
-            o = loadSingle(state, pnode);
-//        }
+        o = loadSingle(state, pnode);
     } else {
         DEBUG << "Not an object node, ignoring" << endl;
     }

          
@@ 1116,36 986,7 @@ ObjectLoader::D::allocateObject(Node nod
 
     return o;
 }
-/*
-QObject *
-ObjectLoader::D::loadSingle(NodeObjectMap &map, NodeSet &examined,
-                            Node node, QObject *parent,
-                            QString classHint, CacheingPropertyObject *po)
-{
-    DEBUG << "loadSingle: " << node << endl;
 
-    QObject *o = map.value(node);
-    if (!o) {
-        o = allocateObject(map, node, parent, classHint, po);
-    }
-
-    loadProperties(map, examined, o, node, po);
-
-    callLoadCallbacks(map, node, o);
-
-    return o;
-}
-
-void
-ObjectLoader::D::callLoadCallbacks(NodeObjectMap &map, Node node, QObject *o)
-{
-    foreach (LoadCallback *cb, m_loadCallbacks) {
-        //!!! this doesn't really work out -- the callback doesn't know whether we're loading a single object or a graph; it may load any number of other related objects into the map, and if we were only supposed to load a single object, we won't know what to do with them afterwards (at the moment we just leak them)
-
-        cb->loaded(m_m, map, node, o);
-    }
-}
-*/
 ObjectLoader::ObjectLoader(Store *s) :
     m_d(new D(this, s))
 { }

          
M objectmapper/ObjectLoader.h +91 -3
@@ 50,9 50,97 @@ class TypeMapping;
 /**
  * \class ObjectLoader ObjectLoader.h <dataquay/objectmapper/ObjectLoader.h>
  *
- * ObjectLoader can create and refresh objects based on the types and
- * relationships set out in a Store.  
- *!!!
+ * ObjectLoader constructs objects corresponding to nodes in the RDF
+ * store and sets properties on those objects corresponding to the
+ * node's RDF properties.  The class of each object is based on the
+ * node's RDF type.  TypeMapping is used to relate node types to
+ * object classes, and ObjectBuilder is used to construct the objects
+ * (which must be subclasses of QObject).
+ * 
+ * In addition to some specification of which nodes to load,
+ * ObjectLoader methods may also take a reference to a NodeObjectMap
+ * in which is stored the object corresponding to each loaded node.
+ * This map may be used by the caller as a persistent record of
+ * node-object relationships, as it is updated on each new
+ * ObjectLoader call with any unaffected nodes remaining unchanged in
+ * the map.
+ *
+ * By default, ObjectLoader loads only those objects passed in to each
+ * load() or reload() call.  ObjectLoader sets as many QObject
+ * properties on each object as possible, given the information
+ * available to it:
+ *
+ * \li Properties with non-object-type values will be assigned from
+ * RDF properties with literal value nodes, provided Node::toVariant
+ * is able to carry out the conversion from literal;
+ *
+ * \li Properties with object-type values will be assigned from RDF
+ * properties with URI value nodes, provided those URI nodes have
+ * corresponding objects available to ObjectLoader, i.e. also in the
+ * set being loaded or in the NodeObjectMap.  (However, see also
+ * FollowObjectProperties below.)
+ * 
+ * \li Properties whose value types are sequenced containers such as
+ * QList or std::vector will be assigned from RDF properties with
+ * sequence values, provided their container types have been
+ * registered with ContainerBuilder;
+ *
+ * \li Properties whose value types are set containers such as QSet
+ * will be assigned from the aggregation of all RDF properties with
+ * the appropriate subject and predicate, provided their container
+ * types have been registered with ContainerBuilder.
+ *
+ * Some behaviour can be adjusted using setFollowPolicy and
+ * setAbsentPropertyPolicy, as follows:
+ *
+ * \li \c FollowPolicy is a set of flags describing how ObjectLoader
+ * should recurse from each object to those related to it.  It can be
+ * used to cause ObjectLoader to load more objects than are explicitly
+ * requested.  The flag \c FollowObjectProperties causes objects to be
+ * loaded whenever they are required as the values of properties on
+ * other objects in the loaded set.  The flags \c FollowParent, \c
+ * FollowSiblings and \c FollowChildren cause object tree
+ * relationships to be followed up, using the "parent" and "follow"
+ * properties with the TypeMapping's relationship prefix to determine
+ * family relationships.
+ *
+ * \li \c AbsentPropertyPolicy determines how ObjectLoader handles
+ * properties of an object that have no definition in the RDF store.
+ * These properties are ignored if IgnoreAbsentProperties (the
+ * default) is set, but if ResetAbsentProperties is set ObjectLoader
+ * will attempt to reset each property to its default value based on
+ * the value found in a freshly-constructed default instance of the
+ * object class in question.
+ *
+ * The load procedure follows a defined order:
+ *
+ * \li The requested objects, and any relatives required by the
+ * FollowPolicy, are constructed with their default properties (no
+ * properties assigned from RDF yet).  If the FollowPolicy includes
+ * FollowParents or FollowSiblings, these will be followed before the
+ * current object is loaded; if FollowChildren, they will be followed
+ * afterwards;
+ *
+ * \li After all objects have been constructed, those properties that
+ * have "simple" RDF literal values are assigned for each object;
+ *
+ * \li After all "simple" properties have been assigned, any further
+ * properties are set (those with container and object types);
+ *
+ * \li Finally, any callbacks registered with addLoadCallback are
+ * called for each object that was loaded (i.e. any object that was
+ * either constructed or assigned to).
+ *
+ * Note that ObjectLoader always maintains a one-to-one correspondence
+ * between QObjects and the RDF nodes that it loads as QObjects.  In
+ * particular, where multiple objects have properties that refer to
+ * the same URI, no more than a single value object will be
+ * constructed and the same value object will be assigned to all of
+ * those objects' properties.  This implies that objects to be loaded
+ * using ObjectLoader should be designed so that they do not attempt
+ * to "own" (control lifecycle for) any other QObjects that appear as
+ * their properties.  Ownership must be maintained separately from the
+ * property relationship.
  *
  * ObjectLoader is re-entrant, but not thread-safe.
  */