MS RFC 22a: Feature cache for long running processes and query processing

Date:

2007/06/24

Author:

Tamas Szekeres

Contact:

szekerest at gmail.com

Last Edited:

2007/06/24

Status:

Discussion Draft

Version:

MapServer 5.0

1. Overview

Currently the various query operations involve multiple accesses to the data providers which may cause a significant performance impact depending on the providers. In the first phase all of the features in the given search area are retrieved and the index of the relevant shapes are stored in the result cache. In the second phase the features in the result cache are retrieved form the provider one by one. Retaining the shapes in the memory we could eliminate the need of the subsequent accesses to the providers and increase the overall performance of the query. Implementing the cache requires a transformation of the data between the data provider and the client. From this aspect it is desirable to provide a framework to implement this transformation in a higher level of abstraction.

2. Purpose

The main purpose of this RFC is to implement a feature cache and retain the shapes in the memory for the further retrievals during a query operation. However I would also want to create a mechanism so that a layer could use another layer as a data source. This outer layer could apply transformations on the shapes coming from the base layer or even retain the shapes in the memory for the subsequent fetches. Here are some other examples where this framework would provide a solution:

  1. Constructing geometries based on feature attributes

  2. Modifying the geometries or the feature attribute values

  3. Geometry transformations (like GEOS operations)

  4. Feature cache

  5. Providing default layer style based on external style information, or attribute based styling

  6. Providing custom data filters

Setting up a proper layer hierarchy one can solve pretty complex issues without the need of creating additional (eg. mapscript) code. Later on - as a live example - I’ll show up the solution of the following scenario:

  1. The user will select one or more shapes in one layer (by attribute selection in this case).

  2. Cascaded transformations will be applied on the selected shapes (I’ll use the GEOS convexhull and buffer operations)

  3. In another layer the features will be selected based on the transformed shapes as the selection shapes.

In this proposal to ensure the better readability I’ll avoid embedding much of the code inline. However most of the implementation patches are available at the tracking ticket of this RFC (see later).

3. General principles of the solution

This proposal involves writing additional data providers. These providers will use another layer as the source of the features. To set up this hierarchy of the layers, the CONNECTION parameter of these layers will contain the name of the source layer. In some cases the source layer doesn’t participate in the renderings and should be kept internal to the original layer. Therefore we will establish the support for nesting the layers into each other. In this regard the CONNECTION parameter will contain the full path of the layer it refers to. The path contains the layer names in the hierarchy using the slash ‘/’ as the separator between the names. We can specify the pathnames relative to the actual layer or relative to the map itself.

  1. Specify a reference relative to the map:

LAYER
  NAME "roads"
  CONNECTIONTYPE OGR
  CONNECTION "roads.tab"
  ...
END
LAYER
  NAME "cache"
  CONNECTIONTYPE CACHE
  CONNECTION "/roads"
  ...
END
  1. Specify a reference relative to the outer layer

LAYER
  NAME "cache"
  CONNECTIONTYPE CACHE
  CONNECTION "roads"
  ...
  LAYER
    NAME "roads"
    CONNECTIONTYPE OGR
    CONNECTION "roads.tab"
    ...
  END
END

The main difference between these 2 options is that in the first case the referred layer is contained by the layers collection of the map, so the layer will participate in the drawings. In the second case the referred layer is internal to the outer layer. It is supported that 2 or more layers connect to the same base layer so that the features will be served from the same cache repository. The base layer can reside in any place of the layer hierarchy.

In any case, the extension layer can also be implemented as a pluggable external layer. (CONNECTIONTYPE PLUGIN)

To achieve the desired functionality the following 3 providers will be implemented:

3.1 Feature caching provider

The purpose of this provider is to retain the features from the source layer in the memory so the subsequent fetches on the feature caching provider will be served from the internal cache. With the current implementation I’m planning to retain the shapes of last extent. When the subsequent shape retrieval refers to the same extent (or within that extent) the features will be served from the cache. At the moment I will not address caching features from multiple non overlapping extents and implement more sophisticated cache management (like size limit/expiration handling) but it might be the object of an enhancement in the future. All of the provider specific data will be placed in the layerinfo structure of the provider. The shapes in the cache will be stored in a hashtable. This provider will be implemented in a separate file (mapcache.c).

3.1.1 Shape retrieval options

This provider will be capable to populate the cache with all of the shapes in the given extent or only with the shapes in the resultcache of the source layer. The first one is the default option and the latter can be selected by the following PROCESSING directive:

PROCESSING "fetch_mode=selection"

The cache will be populated upon the WhichShapes call of the caching provider when the given rect falls outside of the previous one. We can also specify that the provider will retieve all the shapes of the current map extent regardless to the search area using the following setting:

PROCESSING "spatial_selection_mode=mapextent"

3.1.2 Items selection

The feature caching provider will store all of the items available regardless of which items have been selected externally (by using the WhichItems call). However the iteminfo will contain the indexes only of the requested items. The GetShape and the NextShape operations will copy only the requested subset of the items to the caller. This solution will provide the compatibility with any other existing provider so there’s no need to alter the common mapserver code from this aspect.

3.1.3 Support for the STYLEITEM “AUTO” option

By using the STYLEITEM “AUTO” option the provider can retrieve the classObj of every shapeObj from the data source. So as to keep on supporting this option the caching provider will be capable to cache these classObj-s as well as the shapeObj-s in a separate hashtable. When the STYLEITEM “AUTO” option is set on the feature caching provider the style cache will also be populated by calling the GetAutoStyle function to the source layer. The subsequent GetAutoStyle calls on the feature caching provider will retrieve the classObj-s from the style cache and provide a copy to the caller.

3.1.4 Support for the attribute filter

The feature caching provider will be compatible with the existing query operations (implemented in the msQueryBy[] functions in mapquery.c). For supporting the msQueryByAttributes the NextShape will use msEvalExpression before returning the shapes to the caller.

3.2 Geometry transformation provider

The geometry transformation provider (implemented in mapgeomtrans.c) will support transparent access to the source layer. Every vtable operations will be dispatched to corresponding vtable operation of the source layer. In addition some of the other layer properties might require to copy between the connected layers.

3.2.1 Items selection

This provider will serve the same subset of the items as the source layer provides. Therefore the InitIteminfo will copy the items and the numitems of the external layer to the source layer, like:

int msGeomTransLayerInitItemInfo( layerObj *layer )
{
    /* copying the item array an call the subsequent InitItemInfo*/
    return msLayerSetItems(((GeomTransLayerInfo*)layer->layerinfo)->srclayer, layer->items, layer->numitems);
}

And upon the GetItems call these values will be copied back to the external layer, like:

int msGeomTransLayerGetItems(layerObj *layer)
{
    /* copying the item array back */
    int result;
    GeomTransLayerInfo* layerinfo = (GeomTransLayerInfo*)layer->layerinfo;
    result = msLayerGetItems(layerinfo->srclayer);
    msLayerSetItems(layer, layerinfo->srclayer->items, layerinfo->srclayer->numitems);
    return result;
}

3.2.2 Applying the transformations

The proposed implementation will support most of the GEOS transformations supported by mapserver (buffer, convexhull, boundary, union, intersection, difference, symdifference). The transformations will be applied right before retrieving a shape to the caller. The desired transformation can be selected using a PROCESSING option, like:

PROCESSING "transformation=buffer"

Some of the transformations will accept further parameters:

PROCESSING "buffer_width=0.2"

Some of these operations will use 2 shapes to create the transformed shape. The reference shape of these transformations can be set externally using the WKT representation of a processing option:

PROCESSING "ref_shape=[WKT of the shape]"

I’m also planning to support retrieving this shape from the features collection of the layer and from an external layer as well.

3.3 Layer filter provider

The layer filter provider (implemented in mapfilter.c) will provide the shape retrieved from the source layer and will not apply any transformation on that. However some of the shapes will be skipped in the NextShape operations depending on the spatial and the attribute selection options. This provider ensures transparent access to the source layer (just as the previous one) but uses a cache for storing the selection shapes. The selection shapes will be retrieved from another layer which can be specified in the following processing option:

PROCESSING "selection_layer=[path to the layer]"

When populating the selection cache I’ll support to retrieve all of the shapes or only the shapes in the result cache as for the caching provider mentioned before:

PROCESSING "fetch_mode=selection"

4. Putting the things together (example)

In this chapter I’ll describe the solution of the scenario have been mentioned earlier. I’ll start with a simple map file definition with 2 layers the counties and the citypoints of the country:

MAP
      NAME "Hungary"
      STATUS ON
      EXTENT 421543.362603 47885.103526 933973.563202 367180.479761
      SIZE 800 600
      IMAGETYPE PNG
      IMAGECOLOR 255 255 255

      PROJECTION
              "proj=somerc"
              "lat_0=47.14439372222222"
              "lon_0=19.04857177777778"
              "x_0=650000"
              "y_0=200000"
              "ellps=GRS67"
              "units=m"
              "no_defs"
      END

      SYMBOL
          Name 'point'
          Type ELLIPSE
          POINTS
              1 1
          END
          FILLED TRUE
      END

      LAYER
              PROJECTION
                      "AUTO"
              END
              NAME "Hun_Counties"
              CONNECTIONTYPE OGR
              CONNECTION "Hun_Megye.TAB"
              STATUS default
              TYPE POLYGON
              LABELITEM "name"
              CLASS
                      TEMPLATE "query.html"
                      LABEL
                              SIZE medium
                              COLOR 64 128 64
                      END
                      STYLE
                              COLOR 208 255 208
                              OUTLINECOLOR 64 64 64
                      END
              END
      END
      LAYER
              PROJECTION
                      "AUTO"
              END
              NAME "Hun_CityPoints"
              CONNECTIONTYPE OGR
              CONNECTION "Hun_CityPoints.TAB"
              STATUS default
              TYPE POINT
              CLASS
                      STYLE
                              COLOR 255 0 0
                              SYMBOL "point"
                              SIZE 2
                      END
              END
      END
END

This map will look like this:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample.png

4.1 Adding the feature cache for the layers

Any of the layers might be cached by using the CACHE layer provider. The original provider might be nested inside the cache. The parameters related to the drawing should be specified for the outer layer, like:

LAYER
            PROJECTION
                    "AUTO"
            END
            NAME "Hun_Counties_cache"
            CONNECTIONTYPE CACHE
            CONNECTION "Hun_Counties"
            STATUS default
            TYPE POLYGON
            LABELITEM "name"
            CLASS
                    TEMPLATE "query.html"
                    LABEL
                            SIZE medium
                            COLOR 64 128 64
                    END
                    STYLE
                            COLOR 208 255 208
                            OUTLINECOLOR 64 64 64
                    END
            END
            LAYER
                PROJECTION
                        "AUTO"
                END
                NAME "Hun_Counties"
                CONNECTIONTYPE OGR
                CONNECTION "Hun_Megye.TAB"
                TYPE POLYGON
            END
    END

The Hun_Counties_cache layer is queryable and all of the existing methods are available to populate the resultcache. In my example I’ll use the mapscript C# drawquery example (implemented in drawquery.cs) to execute an attribute query and use the drawquery afterwards. The following command will be used along with this sample:

drawquery sample.map "('[Name]'='Tolna')" sample.png

Which passes the (‘[Name]’=’Tolna’) query string to the queryByAttributes of the first layer.

The result of the rendering will look like:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.png

The corresponding mapfile can be found here:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.map

4.2 Applying transformations on the counties

In the next step I’ll apply a cascaded GEOS convexhull and buffer operations on the first layer. The results will be rendered in a separate layer using the following definition:

LAYER
             NAME "selectionshape"
             CONNECTIONTYPE GEOMTRANS
             CONNECTION "simplify"
             STATUS default
             TYPE POLYGON
             TRANSPARENCY 50
             CLASS
                     STYLE
                             COLOR 64 255 64
                             OUTLINECOLOR 64 64 64
                     END
             END
             PROCESSING "transformation=buffer"
             PROCESSING "buffer_width=0.2"
             LAYER
                NAME "simplify"
                TYPE POLYGON
                CONNECTIONTYPE GEOMTRANS
                CONNECTION "/Hun_Counties_cache"
                PROCESSING "transformation=convexhull"
             END
     END

The result of the rendering will look like:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.png

The corresponding mapfile:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.map

Which is not the desired result since all of the polygons were transformed. So as to transform only the selected shape an additional cache should be specified by using the “fetch_mode=selection” processing option:

LAYER
            NAME "selectionshape"
            CONNECTIONTYPE GEOMTRANS
            CONNECTION "simplify"
            STATUS default
            TYPE POLYGON
            TRANSPARENCY 50
            CLASS
                    STYLE
                            COLOR 64 255 64
                            OUTLINECOLOR 64 64 64
                    END
            END
            PROCESSING "transformation=buffer"
            PROCESSING "buffer_width=0.2"
            LAYER
               NAME "simplify"
               TYPE POLYGON
               CONNECTIONTYPE GEOMTRANS
               CONNECTION "selectioncache"
               PROCESSING "transformation=convexhull"
               LAYER
                   NAME "selectioncache"
                   TYPE POLYGON
                   CONNECTIONTYPE CACHE
                   CONNECTION "/Hun_Counties_cache"
                   PROCESSING "fetch_mode=selection"
               END
            END
    END

And here is the result of the drawing:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.png

The corresponding mapfile:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.map

4.3 Using the transformed shape as the selection shape

My final goal is to select the features of the point layer using the transformed shape as the selection shape. Therefore I’ll have to use the layer filter provider on the point layer and setting the selection_layer to the transformed layer:

LAYER
         TYPE POINT
         CONNECTIONTYPE LAYERFILTER
         CONNECTION "Hun_CityPoints"
         NAME "selectedpoints"
         STATUS default
         PROCESSING "selection_layer=/selectionshape"
         CLASS
             STYLE
                     COLOR 255 0 0
                     SYMBOL "point"
                     SIZE 2
             END
         END
         LAYER
                 PROJECTION
                         "AUTO"
                 END
                 NAME "Hun_CityPoints"
                 CONNECTIONTYPE OGR
                 CONNECTION "Hun_CityPoints.TAB"
                 TYPE POINT
         END
     END

Here is the result:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.png

The corresponding mapfile:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.map

Note1: Without altering the map configuration I can modify the selection externally and the rendered image will reflect the changes automatically. For example I can select 2 counties using the query string: (‘[Name]’=’Tolna’ or ‘[Name]’=’Baranya’)

The resulting image can be found here:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4a.png

Note2: If there’s no need to render the selection layer I can specify that configuration inline:

LAYER
        TYPE POINT
        CONNECTIONTYPE LAYERFILTER
        CONNECTION "Hun_CityPoints"
        NAME "selectedpoints"
        STATUS default
        PROCESSING "selection_layer=selectionshape"
        CLASS
            STYLE
                    COLOR 255 0 0
                    SYMBOL "point"
                    SIZE 2
            END
        END
        LAYER
                PROJECTION
                        "AUTO"
                END
                NAME "Hun_CityPoints"
                CONNECTIONTYPE OGR
                CONNECTION "Hun_CityPoints.TAB"
                TYPE POINT
        END
        LAYER
                NAME "selectionshape"
                CONNECTIONTYPE GEOMTRANS
                CONNECTION "simplify"
                STATUS default
                TYPE POLYGON
                TRANSPARENCY 50
                CLASS
                        STYLE
                                COLOR 64 255 64
                                OUTLINECOLOR 64 64 64
                        END
                END
                PROCESSING "transformation=buffer"
                PROCESSING "buffer_width=0.2"
                LAYER
                    NAME "simplify"
                    TYPE POLYGON
                    CONNECTIONTYPE GEOMTRANS
                    CONNECTION "selectioncache"
                    PROCESSING "transformation=convexhull"
                    LAYER
                        NAME "selectioncache"
                        TYPE POLYGON
                        CONNECTIONTYPE CACHE
                        CONNECTION "/Hun_Counties_cache"
                        PROCESSING "fetch_mode=selection"
                    END
                END
        END
    END

Here is the result:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.png

And the corresponding mapfile:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.map

And recall that because I’ve used a cache on the counties layer all of these renderings require to retrieve the shapes only once from the original provider. That was the primary objective of this RFC.

5. Modifying the mapserver core

To achieve the desired fuctionality the following major steps should be done in the mapserver core. The details of the proposed changes can be found here:

http://trac.osgeo.org/mapserver/attachment/ticket/2128/common.patch

5.1 Hashtable implementation

The current hashtable implementation (in maphash.c) will be generalized to be capable to store objects as well as strings. Currently the hashtable can store only string values. In addition the we will provide support for specifying the hashsize upon the construction of the hashtable. The objects will be stored in the hashtable by reference and the destroy function of the objects can also be specified externally. The following functions will be added to maphash.c

int initHashTableEx( hashTableObj *table, int hashSize );
void msClearHashItems( hashTableObj *table );
struct hashObj *msInsertHashTablePtr(hashTableObj *table, const char *key, const char *value);
struct hashObj *msFirstItemFromHashTable( hashTableObj *table);
struct hashObj *msNextItemFromHashTable( hashTableObj *table, struct hashObj *lastItem );
int msGetHashTableItemCount(hashTableObj *table);

initHashTableEx can be called to specify the hash size externally. Actually the original initHashTable will be implemented as calling initHashTableEx with MS_HASHSIZE. msClearHashItems will clear all of the elements of the hashtable but does not clear the items array. msInsertHashTablePtr provides the support for adding object by reference to the hashtable. msFirstItemFromHashTable and msNextItemFromHashTable will provide the iteration of the elements efficiently and will be called by the NextShape of the feature caching provider. msGetHashTableItemCount will retrieve the actual number of the hashtable items.

5.2 Extending the layerObj structure to support nesting the layers (map.h)

The sublayers in the layerObj structure will be stored as an array of layers.

typedef struct layer_obj {
  ...
  #ifndef SWIG
  struct layer_obj **layers;
  int numlayers; /* number of sublayers in layer */
  #endif /* SWIG */
  ...
} layerObj;

The parser will take care of loading the nested layers:

int loadLayer(layerObj *layer, mapObj *map)
{
  ...
  case (LAYER):
    if(layer->numlayers == 0)
        layer->layers = (layerObj**) malloc(sizeof(layerObj*));
    else
        layer->layers = (layerObj**) realloc(layer->layers, sizeof(layerObj*) * (layer->numlayers));
    layer->layers[layer->numlayers]=(layerObj*)malloc(sizeof(layerObj));
    if (layer->layers[layer->numlayers] == NULL) {
         msSetError(MS_MEMERR, "Malloc of a new layer failed.", "loadLayer()");
             return MS_FAILURE;
    }
    if(loadLayer(layer->layers[layer->numlayers], map) == -1) return MS_FAILURE;
    layer->layers[layer->numlayers]->index = layer->numlayers; /* save the index */
    ++layer->numlayers;
    break;
  ...
}

5.3 Adding a new built in data connection types (map.h)

A new data connection type is established for the new providers.

enum MS_CONNECTION_TYPE {MS_INLINE, MS_SHAPEFILE, MS_TILED_SHAPEFILE, MS_SDE, MS_OGR, MS_UNUSED_1, MS_POSTGIS, MS_WMS,   MS_ORACLESPATIAL, MS_WFS, MS_GRATICULE, MS_MYGIS, MS_RASTER, MS_PLUGIN, MS_GEOMTRANS, MS_LAYERFILTER, MS_CACHE };

The lexer will be modified to interpret the new connection types.

5.4 Support for destroying the persistent data of the providers (map.h, maplayer.c)

The providers would keep persistent data between the various Connection/Close operations on that layer so we should establish a mechanism to destroy this provider specific data by adding a new method to the layervtable:

void (*LayerDestroy)(layerObj *layer);

5.5 Vtable initialization for the new data providers (maplayer.c)

int
msInitializeVirtualTable(layerObj *layer)
{
  if (layer->baselayer)
      msInitializeVirtualTable(layer->baselayer);
  ...
  switch(layer->connectiontype) {
  ...
  case(MS_GEOMTRANS):
          return(msGeomTransLayerInitializeVirtualTable(layer));
          break;
      case(MS_LAYERFILTER):
          return(msFilterLayerInitializeVirtualTable(layer));
          break;
      case(MS_CACHE):
          return(msCacheLayerInitializeVirtualTable(layer));
          break;
  ...
  }
  ..
}

6. Files affected

The following files will be affected by this RFC:

maphash.c
maphash.h
maplayer.c
maplexer.l
mapfile.c
map.h
Makefile.vc
Makefile.in
mapcache.c  (new)
mapgeomtrans.c  (new)
mapfilter.c  (new)

7. Backwards compatibility issues

These changes will retain mapfile and mapscript backward compatibility.

8. Bug ID

The ticket for RFC-22a can be found here.

Bug 2128

9. Voting history

None