INSPIRE 2010 Conference (23 June, Metadata Session) Krakow, Poland

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation.
Arif Shaon and Andrew Woolf  (e-Science, Science and Technology Facilities Council)

Andrew Woolf presented.

Long-term Preservation of Spatial Information: Motivation for INSPIRE

•The INSPIRE directive requires global availability and uniform accessibility of heterogeneous environmental datasets across Europe through interoperability.

Interoperability does not always guarantee sustainability over the long-term. A key question to be answered – What happens to the data when a data provider ceases to exist?

•  A phenomenal deluge of spatial data over the last decade

– Triggered by the growing concerns over environmental problems, such as global climate changes.
– Ensuring sustained access to these data is becoming more difficult.

Efficient long-term preservation is required for both current and historical spatial data exposed through INSPIRE.

Main challenges:

Environmental data inherit the preservation challenges inherent to all digital information.

– Existing preservation approaches and standards, such as the OAIS Reference Model should be also applicable to environmental data.

Environmental data adds to these:

– Highly structured and complex data models (“feature types”) that require special knowledge for accurate interpretation.
– Static data being replaced with dynamic web services, such as the OGC web services.
– Existing preservation approaches would need to be tailored to handle these added complexities.
– The work presented explored the applicability of the OAIS Reference model to the preservation of environmental data.

Notable initiatives:

  • ESA announcing the Long Term Digital Preservation (LTDP) initiative for their Earth Observation datasets.
  • The National Geospatial Digital Archive (NGDA) project funded under the National Digital Information. Infrastructure and Preservation Program (NDIIPP) – approach specific to US-based data.
  • The Geospatial Electronic Records (GER) project: new metadata format introduced is incompatible with ISO 19115 – the metadata format required by European law and INSPIRE for describing European environmental data.
  • Some exploratory work by the Digital Preservation Coalition (DPC).

The Open Archival Information System (OAIS) Reference Model

• A widely adopted ISO Standard for long-term preservation of digital objects
• Defines an information model that needs to be captured for effective preservation

Preservation Aspects of INSPIRE SDI

What is missing:

• ISO 19115 is not curation aware
• Insufficient Representation Information (RI)
• Data annotation is not captured

What already exists:

• ISO 19115 – Good for resource discovery
• Controlled vocabulary for semantic metadata validation

• Ad-hoc approaches to data management and storage
• Not considered in this project

Preservation profile of ISO

  • Extends “MD_ApplicationSchemaInformation” used to create a particular feature view of a source spatial dataset

•  Adds information about the mapping between a source data and its application schema.
•  Adds information about applications/software/services required to effectively apply the mapping.
•  Defines additional data specific RI (e.g. data formats, storage media), mainly in the form of web-accessible resources (e.g. URL).
•  Enables data providers to record RI in other formats than ISO 19115.

A Prototype Preservation-aware Geo-Portal

  • Implemented a web-based portal that demonstrates the underlying functions of a preservation-aware SDI

•  Based on GeoNetwork – a widely adopted open source and standards-based catalogue service, also used for the INSPIRE GeoPortal.

Key features:

  • Recording, editing, searching and viewing metadata in the Preservation profile of ISO 19115
  • Versioning of metadata
  • Annotation of both data and metadata through an intuitive and user friendly wizard; captures annotation context in Xpath for metadata records

A Prototype Preservation-aware Geo-Portal (Annotation Wizard)

1) Dataset overview
2) Added annotations
3) X-path based annotation context

Conclusions and future directions:

  • Long-term preservation of both current and historical environmental data exposed through INSPIRE is highly important for monitoring and analysing climate change.

•  Awareness is growing in Europe with the emergence of ESA LTDP, albeit not addressed in the current INSPIRE directive.
•  The work presented investigates the requirements for a preservation-aware SDI for INSPIRE and presents a preservation profile of ISO 19115 that outlines the metadata requirements.
•  Future work would need to focus on the implementation of efficient and interoperable data preservation solutions for the INSPIRE data repositories.

Good presentation and an important reality to address now.  How many will wait until it’s too late, then we’ll be left with countless dataset orphans.

As part of Go-Geo! project in 2006, we conducted spatial data audits at four UK universities and the results yielded almost 600 dataset titles and hundreds more orphan datasets- those without provenance, but could be identified from file extentions as spatial datasets (eg: shapefiles, Mid/Mif).

With regards to preservation, who will decide which spatial datasets are worth the resources required for preservation?  Without metadata for these 600 datasets from the audit, how do we know which ones should be reused and retained?

Adding Metadata to Maps and Styled Layers to Improve Maps Efficiency.
Benedicte Bucher, Sebastien Mustiere, Laurence Jolivet and Jeremy Renard

A presentation from the Institut Geographique National (IGN).

Overlaying map layers sharing similar colours can make maps illegible- difficult to interpret.

Not enough colours to represent these features.

Need to formulate objective rules about map design: constraints related to legibility, properties of graphical variables and some conventions.

Facilitating overlays:

Summing the meaning and checking if combined styled fit to the meaning.

Metadata to describe layers meaning (= the information content and how it should be conveyed (reading order))

Meaning…………………………………………………………………………..……………Graphical signs

( Main theme, Scale,           Feature representation    Layer representation              Styles

Geographic entities)

•  Relationships between styles and meaning (e.g. : « same hue means same nature »).

Building a catalogue

Extrapolating European topographic legends to IGN data and qualifying the maps.

  • three datasets (diffferent kind of area)
  • legend adapted to IGN simplified model
  • the main background colour remained the same (as in the country the legend comes from).
  • Other topographic legends have been designed based on colour schemes extracted from a colour harmony book.
  • Keywords attached to the colour schemes in the book.

Examples provided:

Professional, neutral, serious (dark green dominant on map)

Royal, strong, brave, authoritative, noble, far east (light green dominant on map)

Enthusiastic, young, energetic, innovative (pink dominant on map)

Other topographic legends have been designed based on … colour schemes and grammar rules extracted from French artists’ masterpieces.

Examples showing mood colours on maps provided.

Conclusions:

Facilitating overlay layers to necessitate clarity with expected themes, reading orders, relationships between themes, etc of the resulting map.

Change colours (or change predefined layers) to maintain legibility and avoid any misinterpretation of graphical relationship.

The Central Catalogue Service in Germany
Juergen Walther

Presentation from the Federal Agency for Cartography and Geodesy.

A centralised catalogue service interoperable with 16 federal states running 12 discovery services; also two thematic network services.

Creating an abstract test suite for OGC CSW 2.0.2 AP ISO 1.0.

Test-Software: TEAM Engine (Test, Evaluation and Measurement Engine)

Documentation:

1. Document: Abstract Test Suite for OGC CSW 2.0.2 AP ISO 1.0
2. Document: CTL-scipts (CTL=Compliance-Test-Language)
3. Document: sample metadata (XML-files ISO 19139)

http://www.gdi-de.org/de_neu/test/navl_test.html

Conformance Test for AP ISO 1.0 (Level 1: Discovery) is based on the OGC Team Engine. More than three Catalogues are compliant (one Open Source) so that the Test can be placed as official OGC test.

Advantages:

  • one access point
  • quick, high quality, cost efficient
  • approved OGC compliance
  • IR network services compliant
  • test with INSPIRE portal
  • single point of maintenance

Summary:

  • system for interdisciplinary collection, consolidation, contribution and search of metadata.
  • metadata from distributed catalogues of municipality, state and country level
  • central German metadata node for INSPIRE and German SDI
  • indexing of the metadata for a quick search
  • storage of the complete, original xml-files
  • ranking is realised for the simple search
  • double datasets are deleted

Result: a consolidated, efficient and high performance data access for INSPIRE, GEOSS and the German SDI.

Comparative Quality Assessment of Metadata. Two Regional SDI case studies. (IDEC & IDE-CLM).
Paula Díaz, Joan Masó and Jordi Guimet
Presentation from the Department of Geography at Universitat Autonoma de Barcelona.

Aim of the study:

  • Detect and analyse errors in the metadata sets
  • Determine the nature of these errors
  • Determine their percentage of presence
  • Make recommendations for avoiding them

THE IDEC (Catalonia)

  • Created in 2002
  • The Metadata catalogue.
  • The program MetaD: Creation and edition of metadata sets.

IDE-CLM (Castilla la Mancha)

  • Created in 2006. Under the INSPIRE Directive.

The standardisation facilitates the:

  • Interoperability
  • Comparison

This study is based on three standards:

  • ISO 19115: to establish the elements as a basis for study
  • ISO 19139: to understand the XML documents
  • OGC-CSW: to download XML metadata documents
  • At time of this study only IDEC and IDE-CLM had OGC catalogues

Creation of a metadata database

  • Metadata Sets in XML
  • Database containing all the metadata sets
  • Extraction of all the mandatory elements and also other optional elements

Database:

  • Columns: Mandatory and optional elements.
  • Rows: XML files identified by their UUID.

The IDEC:

  • 35 columns and 14,616 rows.

The IDE-CLM:

  • 35 columns and 98 rows.

All the elements extracted are mandatory by the INSPIRE directive regarding metadata.

Quality Analysis of metadata sets

40,000 metadata records.

Errors in the metadata sets of autonomic catalogues.

  • Lack of compliance with the ISO 19115 requirements.

Some examples of mandatory elements:

IDEC

IDE-CLM

Lack of metadata date

353 (2.42)

*****

Lack of datasets dates

1,779 (12.17)

13 (36.7)

Lack of extent

33 (0.23)

29 (29.21)

Lack of creator contact

39 (0.27)

2 (2.04)

Scale factors (inconsistent with map)

341 (2.33)

9 (0.09)

Reasons for the presence of errors in the metadata.

Three main reasons:

  • Lack of accurate information by the metadata creator (such as the date of creation of the dataset)
  • The difficulty of determining the information required (the scale information in tabular information with x,y positions)
  • Ignorance of certain factors (e.g. processes)
  • Methods for creating metadata are not exempt of the generation of errors.
  • There is a high percentage of error in the manual compilation of metadata elements.

Recommendations:

  • The most immediate recommendation is correct the lacks of mandatory elements.
  • The SDI can inform the metadata providers to facilitate them solve errors.
  • We recommend establishing common rules for generic creation of metadata titles.
  • Use a thesaurus for the selection of keywords.

Conclusions:

It’s possible to carry out a systematic review of the metadata sets of SDI in order to:

  • Detect errors, weaknesses, or lacks of good practices.
  • Determine the organisms responsible of a specific problem.
  • Periodic quality checks of the metadata can be made to detect errors or lacks.
  • INSPIRE is more demanding than ISO 19115 respect to the completeness of the metadata.
  • This analysis of metadata manifests the presence of different kinds of errors in the metadata sets.
  • Much metadata sets have errors that can’t be involuntarily made from common metadata tools.
  • There is a lower quality of description in the optional elements.
  • There is a need to implement more quality control procedures.
  • The average error for all metadata sets is around 3.84% in the IDEC and 11.73% in the IDE-CLM.
  • This analysis applied to regional SDI, shows that quality is a compromise between agility for providers who create metadata and the needs of the end-user who wants as much detailed information as possible.
Common errors found it the SDI (%)

IDEC

IDE-CLM

Metadata date in blank

2.42

0

Data dates in blank (the three)

12.17

37

Creation date later than metadata date

3.36

0

Creation date “1900-01-01”

9.48

0

Topic category not in codelist

9.7

0

Topic category in blank

3.41

3

Contact information in blank

0.27

2

Geographic extent not in angles (lat/long)

0.18

60

Minimum coordinate greater than the maximum

0.01

1

Data language in blank

2.44

26

Incorrect metadata language

0.35

3

Inconsistent scale factors

2.33

9

Average error

3.84

11.71

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to INSPIRE 2010 Conference (23 June, Metadata Session) Krakow, Poland

  1. Pingback: Go-Geo! blog » INSPIRE 2010 Conference (23 June, Metadata Session … | Breaking News 24/7

Leave a Reply

Your email address will not be published. Required fields are marked *