Data Format Types¶
Each type of data is made up of a data source and (one or more) layers. These two definitions apply to MapServer and OGR.
Data Source - a group of layers stored in a common repository. This may be a file that handles several layers within it, or a folder that has several files.
Layer - a sub-set of a data source often containing information in one type of vector format (point, line, polygon).
There are three types of data mapping and GIS data formats. Each type is handled differently. Below are the types and some example formats:
File-based- Shapefiles, Microstation Design Files (DGN), GeoTIFF images
Directory-based - ESRI ArcInfo Coverages, US Census TIGER
Database connections - PostGIS, ESRI ArcSDE, MySQL
File-based data consists of one or more files stored in any arbitrary folder. In many cases a single file is used (e.g. DGN) but ESRI Shapefiles, for example, consist of at least 3 files each with a different filename extension: SHP, DBF, SHX. In this case all 3 files are required because they each perform a different task internally.
Filenames usually serve as the data source name and contain layers that may or may not be obvious from the filename. In Shapefiles, for example, there is one data source per shapefile and one layer which has the same name as that of the file.
Directory-based data consists of one or more files stored in a particular way within a parent folder. In some cases (e.g. Coverages) they may also require additional folders in other locations in the file tree in order to be accessed. The directory itself may be the data source. Different files within the directory often represent the layers of data available.
For example, ESRI ArcInfo Coverages consist of more than one file with an ADF file extension, within a folder. The PAL.ADF file represents the Polygon data. ARC.ADF holds the arc or line string data. The folder holds the data source and each ADF file is a layer.
Database Connections are very similar to file and directory-based structures in one respect: they provide geographic coordinate data for MapServer to interpret. That may be oversimplifying what is happening inside MapServer, but in essence all you need is access to the coordinates making up the vector datasets.
Database connections provide a stream of coordinate data that is temporarily stored (e.g. in memory) and read by MapServer to create the map. Other attribute or tabular data may also be required, but the focus of this guide is coordinate data.
One important distinction between databases must be made. The databases discuss here are spatial databases, those which can hold geographic data in its own data type. This is opposed to strictly tabular databases which cannot hold geographic coordinates in the same way. It is possible to store some very simple coordinate data in regular tables, but for anything but the most simple use a spatial database is required. There are spatial extensions to many databases (Open Source and commercial). One of the most robust is the PostGIS extension to the PostgreSQL database. This database not only allows the storage of geographic data, but also allows the manipulation of that data using SQL commands. The other Open Source database with spatial capabilities is MySQL.
Connections to databases usually consist of the following pieces of connection information:
Host - Directions to the server or computer hosting the database.
Database name - The name of the database you wish to access that is running on the host.
User name / passwords - Access privileges are usually restricted by user.
Some databases (e.g. Oracle) use a name service identifier that includes both the host and database names.
Access to specific pieces of coordinate data usually require:
Table/View name - The name of the table or view holding the coordinate data.
Geographic column name - Where the geometry or coordinates are stored.