Skip to content

Guide

Understanding ISCC Schemas

Digital content identification requires structured metadata: data that describes what a piece of content is, who created it, and how it may be used. Without a shared vocabulary, every platform invents its own metadata format, and interoperability breaks down.

ISCC schemas solve this by defining a single, machine-readable vocabulary for digital content metadata. They are part of the broader ISCC ecosystem, which provides content-derived identifiers standardized as ISO 24138:2024.

The iscc-schema package is the canonical source for these definitions. From YAML-based OpenAPI 3.1.0 source files, the build pipeline generates:

  • JSON Schema for validating metadata objects
  • JSON-LD Context for semantic mappings to established vocabularies (schema.org, Dublin Core)
  • Python Models (Pydantic) for creating and validating metadata in Python

JSON Schema and JSON-LD

ISCC metadata uses two complementary standards that serve different needs:

JSON Schema is the primary interface for most developers. It defines field types, constraints, and defaults for validation — but it also carries human-readable field descriptions, examples, and documentation inline. This makes it self-documenting: a developer (or an AI agent) can read the schema and understand every field without consulting external resources.

JSON-LD Context maps compact field names like name or creator to global semantic URIs (e.g., http://schema.org/name), enabling ISCC metadata to participate in the Linked Data web. This is valuable for semantic interoperability across systems, though it requires dereferencing URIs to access human-readable descriptions.

Both are generated from the same YAML source definitions, so they are always in sync. Since v0.5.0, JSON Schema files embed the JSON-LD context directly - one file provides validation rules, inline documentation, and semantic mappings.

For most use cases, plain JSON with a $schema reference is all you need. The schema handles validation and documentation. When semantic interoperability matters, the embedded JSON-LD context is always available - either carried in the data directly, or recovered from the schema on demand (see Schema-Driven Context Recovery below).

Schema-Driven Context Recovery

iscc-schema v0.5.0 introduced schema-driven context recovery: reconstructing full JSON-LD from compact, plain JSON data.

Plain JSON is compact and easy to work with, but a field called name could mean anything without semantic context. JSON-LD fixes this with an @context that maps fields to global identifiers - but carrying the full context in every object adds bulk. Since the $schema field already points to the JSON Schema describing the data, the schema can embed the @context too. Any consumer can then recover the full JSON-LD context from the schema reference alone.

How It Works

flowchart LR
    A["Plain JSON"] --> B["Read $schema"]
    B --> C["Extract @context\nfrom schema"]
    C --> D["Valid JSON-LD"]
  1. Read the $schema URL from the data (or infer it from @type)
  2. Look up the embedded @context for that schema
  3. Inject the context into the data, which is now valid JSON-LD

Python Example

from iscc_schema import recover_context

# Plain JSON data without @context
data = {
    "$schema": "http://purl.org/iscc/schema/0.8.0.json",
    "iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    "name": "The Never Ending Story",
}

# Recover the JSON-LD context from the schema reference
result = recover_context(data)

# Result now includes @context with semantic mappings
assert "@context" in result
assert result["name"] == "The Never Ending Story"

The function also works with @type when $schema is absent:

from iscc_schema import recover_context

data = {"@type": "ISBN", "isbn": "9789295055124"}
result = recover_context(data)
# @context is resolved from the ISBN schema

Language-Agnostic Approach

Outside Python, the recovery process is simple:

  1. Fetch the JSON Schema from the $schema URL
  2. Read the @context property from the schema's properties section
  3. Merge the context value into the data object
schema = HTTP_GET(data["$schema"])
context = schema["properties"]["@context"]["default"]
data["@context"] = context
// data is now valid JSON-LD

Benefits

Data stays lean because there is no need to embed the full context in every object. Any consumer can still reconstruct JSON-LD on demand. The schema does triple duty: validation, semantics, and documentation in one file.

Schema Categories

ISCC Metadata

IsccMeta is the core metadata model. All fields are optional, so it works for anything from minimal identification (just an iscc field) to full content descriptions with rights, technical properties, and cryptographic declarations.

from iscc_schema import IsccMeta

meta = IsccMeta(
    iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    name="The Never Ending Story",
    description="a 1984 fantasy film co-written and directed by Wolfgang Petersen",
)

See the ISCC Metadata schema reference for all available fields.

Seed Metadata

Seed metadata provides industry-specific input for Meta-Code generation (IEP-0002). Unlike ISCC Metadata, seed schemas have strict required fields to ensure interoperable content fingerprinting across platforms.

  • ISBN: book metadata (ISBN, title, publisher, language, etc.)
  • ISRC: sound recording metadata (ISRC, artist, track title, duration, etc.)
  • STM: scholarly metadata for DOI-identified scientific/technical/medical works (DOI, title, publisher, pubyear, etc.)
from iscc_schema import ISBN

seed = ISBN(
    isbn="9789295055124",
    productform="EA",
    title="The Never Ending Story",
    language="eng",
    imprint="Penguin Classics",
    publisher="Penguin Random House",
    country="US",
    pubdate="20240214",
)

See the ISBN, ISRC, and STM schema references.

Service Metadata

Service metadata covers use-case-specific schemas served by ISCC registries and discoverable through ISCC gateways.

  • TDM: machine-readable text and data mining reservation signals
  • GenAI: generative AI disclosure signals for content transparency
  • Identifiers: a registry response listing the external identifiers associated with an asset

Service metadata can be used standalone or embedded as nested objects in IsccMeta. Typed identifier objects are accepted by IsccMeta.identifier and by the Identifiers.identifier list:

from iscc_schema import IsccMeta

meta = IsccMeta(
    iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    name="The Never Ending Story",
    identifier=[
        {"scheme": "iswc", "code": "T-034.524.680-1", "scope": "work", "primary": True},
        {"scheme": "isrc", "code": "USRC17607839", "scope": "manifestation"},
    ],
    tdm={"tdm_reservation": 1, "tdm_policy": "https://example.com/tdmrep-policy.json"},
)

Registries can serve the same identifier objects through the standalone Identifiers response:

from iscc_schema import Identifiers

record = Identifiers(
    iscc="ISCC:MAACAJINXFXA2SQX",
    identifier=[{"scheme": "doi", "code": "10.1234/example.2024.001", "scope": "work"}],
)

See the TDM, GenAI, and Identifiers schema references.

Protocol Schemas

Protocol schemas are ISCC Discovery Protocol wire records hosted and versioned here. Like seed metadata, they default to compact JSON with a version-specific $schema as the sole version anchor (the @context and @type are dropped and recovered on demand).

  • IsccNote: the permanent ISCC Declaration log record for ISCC-HUB timestamping and registration

See the ISCC Note schema reference.

Python Usage

Creating Metadata Objects

All models validate input on construction. Invalid data raises a ValidationError:

from iscc_schema import IsccMeta

# Valid: all fields are optional
meta = IsccMeta(iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY")

# Invalid: raises ValidationError
try:
    IsccMeta(iscc="not-a-valid-iscc")
except Exception as e:
    print(e)

Serialization Formats

All models support .dict(), .json(), and .jcs() serialization. Each method accepts an ld parameter that controls whether JSON-LD fields (@context, @type) are included:

from iscc_schema import IsccMeta

meta = IsccMeta(
    iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    name="The Never Ending Story",
)

# Python dict, excludes unset fields by default
meta.dict()
# {'iscc': 'ISCC:KACY...', 'name': 'The Never Ending Story'}

# JSON string, includes schema defaults (@context, @type, $schema)
meta.json()
# '{"@context":"http://purl.org/iscc/context/0.8.0.jsonld","@type":"CreativeWork","$schema":"http://purl.org/iscc/schema/0.8.0.json",...}'

# JCS canonical bytes, deterministic serialization for hashing
meta.jcs()
# b'{"$schema":"http://purl.org/iscc/schema/0.8.0.json","@context":...}'

# Compact JSON without JSON-LD fields
meta.json(ld=False)
# '{"$schema":"http://purl.org/iscc/schema/0.8.0.json","iscc":"ISCC:KACY...","name":"The Never Ending Story"}'

Field names are automatically translated to their JSON-LD aliases in all serialization formats (context_@context, type_@type, schema_$schema).

Serialization Defaults by Model Type

Different model types have different ld defaults to match their intended use:

Model Default ld Rationale
IsccMeta True Core metadata, full JSON-LD for semantic interoperability
ISBN, ISRC, STM False Seed input for Meta-Code generation (IEP-0002), compact JSON with $schema
TDM, GenAI, Identifiers True Service metadata served by registries, full JSON-LD for discovery
Identifier No JSON-LD wrapper fields Bare identifier item used inside IsccMeta.identifier and Identifiers.identifier
IsccNote False Protocol wire record, compact JSON with version-specific $schema

Seed metadata defaults to compact JSON because IEP-0002 accepts plain application/json and the $schema reference makes data self-describing - any consumer can recover the full JSON-LD context on demand:

from iscc_schema import ISBN

seed = ISBN(
    isbn="9789295055124",
    productform="EA",
    title="The Never Ending Story",
    language="eng",
    imprint="Penguin Classics",
    publisher="Penguin Random House",
    country="US",
    pubdate="20240214",
)

# Compact by default (ld=False)
seed.json()
# '{"$schema":"http://purl.org/iscc/schema/isbn-0.8.0.json","isbn":"9789295055124",...}'

# Full JSON-LD when needed
seed.json(ld=True)
# '{"@context":"http://purl.org/iscc/context/0.8.0.jsonld","@type":"ISBN","$schema":...}'

Strict Validation

Most schema models use extra="forbid", so passing unrecognized fields raises a ValidationError. This is intentional: iscc-schema defines a standard, and strictness catches typos early. It also matches JSON-LD semantics, where extra fields without @context mappings would be meaningless to processors. Service metadata objects are the deliberate exception — TDM, GenAI, Identifiers, their nested identifier items, and the inline service objects in IsccMeta set extra="allow" to preserve forward-compatible service signals across versions.

Downstream consumers who need flexibility can subclass with a one-line override:

from pydantic import ConfigDict
from iscc_schema import IsccMeta

class FlexibleIsccMeta(IsccMeta):
    model_config = ConfigDict(extra="allow")

# Additional fields are preserved through serialization
meta = FlexibleIsccMeta(
    iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    custom_field="custom value",
)
meta.dict()
# {'iscc': 'ISCC:KACY...', 'custom_field': 'custom value'}

Using JSON Schema Directly

The published JSON Schema works with any JSON Schema validator in any language.

Python (using jsonschema):

import json
import jsonschema
import urllib.request

schema_url = "http://purl.org/iscc/schema"
schema = json.loads(urllib.request.urlopen(schema_url).read())

data = {
    "iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
    "name": "The Never Ending Story",
}

jsonschema.validate(data, schema)  # raises on invalid data

JavaScript (using ajv):

import Ajv from "ajv";

const schema = await fetch("http://purl.org/iscc/schema").then(r => r.json());
const ajv = new Ajv();
const validate = ajv.compile(schema);

const data = {
  iscc: "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
  name: "The Never Ending Story",
};

if (!validate(data)) {
  console.error(validate.errors);
}