This page documents the Best Practice guidelines for CLARINO metadata. CLARINO follows CLARIN’s guidelines and recommend to use the CMDI metadata format. The metadata editor COMEDI, developed by Paul Meurer for CLARINO, lets you create and edit any CMDI profile from the CLARIN Component Registry. These guidelines hold irrespective of whether metadata is created with COMEDI or any other means.
Profiles and components
Which profiles does CLARINO recommend?
For CLARINO partners, the CLARINO-recommended profiles and components is fully documented at Agora (including discussion forums, meeting minutes from discussions etc). In brief, the following profiles are recommended (and are also found in COMEDI' drop-down menu:
- corpusProfile (clarin.eu:cr1:p_1407745711925) - describes corpora of all types and modalities.
- lexicalProfile (clarin.eu:cr1:p_1428388179419) - describes lexical resources
- teiProfile (clarin.eu:cr1:p_1422885449322) - when it using teiHeader is preferred
- toolProfile (clarin.eu:cr1:p_1422885449331) - describes software (NOTE: has not been extensively tested as of Oct. 2015)
You are free to develop your own; in that case try to reuse existing components to the extent possible.
NOTE: In CLARINO, the component ResourceCommonInfo is obligatory. This component contains administrative information that we find relevant for any type of resource. By using the component ResourceCommonInfo, the national metadata registry at the National Library can enable the user of the catalogue to search and filter efficiently among the full set of resources and tools.
Language / linguality
The CLARINO best practice is that Norwegian bokmål and nynorsk are not treated as language varieties in metadata, but as languages (in virtue of having their own ISOCAT language codes, just like the language Norwegian). Since ISO has not created a hierachy to express that bokmål/nynorsk is a subterm for Norwegian, we have the following best practice to ensure that the resource can be found (and filtered away) as both Norwegian (code: no) and Norwegian bokmål/nynorsk (nb/nn):
1. For resources in Norwegian, always create a component for Norwegian, code: "no".
2.If it is relevant to also specify Norwegian bokmål/nynorsk, create a new component for Norwegian Bokmål (code: "nb") and/or nynorsk (code: "nn")
Let linguality then be monolingual, to express that it is all Norwegian.
PID, Self link, URLs
- All resources/files considered to be stable and directly accessible on the Internet should get a PID, i.e. a persistent identifier.
- A PID in CLARIN should be a handle.
- Self link (metadata PID): The metadata file recording a resource should get a Self link, i.e. a metadata PID. This Self link is placed in the CMDI metadata file's header; see MdSelfLink.
- Resource PID: Assign a PID to the resource described in the metadata file under Component ResourceCommonInfo.IdentificationInfo. In COMEDI, the PID in this field will resolve to the first URL in the field Component ResourceCommonInfo.URL.
Note that the metadata Self link must resolve to a URL supporting content negotiation, more specifically HTTP accept header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1). It must be possible to deliver two media types:
- text/html for a human-readable representation in a browser.
- application/x-cmdi+xml for a machine-readable representation of CMDI metadata
Question: When filling in metadata about a resource/referring to a resource, should I use the handle (e.g. hdl:10037.1/10005) or the URL (e.g. http://hdl.handle.net/10037.1/10005)?
A handle is a PID, but is not necessarily a URL. The transition from a handle to a URL must be fixed by the application that visualizes your metadata.
So: use the handle, e.g. hdl:10037.1/10005.
How to refer to a resource?
To refer to a resource, use the Metadata Self link (if available). Else, use the resource PID. If none are available, use an ordinary URL.
Expressing relations between resources
Example: I am creating a metadata file for a Collection which contains four individual resources. I have created metadata files for each of the four individual resources (which have individual metadata self links and resource PIDs). I now want to express a part-of relation between the collection and the four individual, external resources. (‘external’ in the sense that it has its own metadata file outside the current metadata file).
To express that “The resource described in the current metadata file has the following, external parts”:
1. Go to the Resources section to the left of the top-level components.
Fig. 1: Resources section
2. Click the ‘+’ button adjacent to Resource proxy list
. This will open text fields for typing.
3. In the first field, provide the Metadata self link of the external resource. If the external resource has a metadata file in COMEDI, click Select PID
to select it from a grop-down menu. If the external resource is not documented with metadata, provide its resource PID (if it has one), or, as a last resort, its URL.
4. In the id:
field, type an identifier for the resource.
5. In the Type
field, select the appropriate type from a drop-down menu. (If the provided PID is a metadata Self link, choose type Metadata
, if the provided PID is a resource PID, choose type Resource
6. In the Mime type
field, type the appropriate mime type. (For example, CMDI metadata files have mime type application/x-cmdi+xml
7. Add as many PIDs as you have external resource parts.
Fig. 2: A resource proxy list pointing to resource parts with their own metadata files.
1. Go to the Resources section to the left of the top-level components (cf. fig. 1).
2. Click the ‘+’ button adjancent to Is part of list. This will open a text field for typing the PID of the ‘mother’ resource.
3. If the ‘mother’ resource has its own metadata file, type its metadata Self Link. Else, type its resource PID, or, as a last resort, its URL.
The Component ResourceCommonInfo