Guide¶
Understanding ISCC Schemas¶
Digital content identification requires structured metadata: data that describes what a piece of content is, who created it, and how it may be used. Without a shared vocabulary, every platform invents its own metadata format, and interoperability breaks down.
ISCC schemas solve this by defining a single, machine-readable vocabulary for digital content metadata. They are part of the broader ISCC ecosystem, which provides content-derived identifiers standardized as ISO 24138:2024.
The iscc-schema package is the canonical source for these definitions. From YAML-based OpenAPI
3.1.0 source files, the build pipeline generates:
- JSON Schema for validating metadata objects
- JSON-LD Context for semantic mappings to established vocabularies (schema.org, Dublin Core)
- Python Models (Pydantic) for creating and validating metadata in Python
JSON Schema and JSON-LD¶
ISCC metadata uses two complementary standards that serve different needs:
JSON Schema is the primary interface for most developers. It defines field types, constraints, and defaults for validation — but it also carries human-readable field descriptions, examples, and documentation inline. This makes it self-documenting: a developer (or an AI agent) can read the schema and understand every field without consulting external resources.
JSON-LD Context maps compact field names like name or creator to global semantic URIs
(e.g., http://schema.org/name), enabling ISCC metadata to participate in the Linked Data web.
This is valuable for semantic interoperability across systems, though it requires dereferencing
URIs to access human-readable descriptions.
Both are generated from the same YAML source definitions, so they are always in sync. Since v0.5.0, JSON Schema files embed the JSON-LD context directly - one file provides validation rules, inline documentation, and semantic mappings.
For most use cases, plain JSON with a $schema reference is all you need. The schema handles
validation and documentation. When semantic interoperability matters, the embedded JSON-LD context
is always available - either carried in the data directly, or recovered from the schema on demand
(see Schema-Driven Context Recovery below).
Schema-Driven Context Recovery¶
iscc-schema v0.5.0 introduced schema-driven context recovery: reconstructing full JSON-LD from compact, plain JSON data.
Plain JSON is compact and easy to work with, but a field called name could mean anything without
semantic context. JSON-LD fixes this with an @context that maps fields to global identifiers -
but carrying the full context in every object adds bulk. Since the $schema field already points
to the JSON Schema describing the data, the schema can embed the @context too. Any consumer can
then recover the full JSON-LD context from the schema reference alone.
How It Works¶
flowchart LR
A["Plain JSON"] --> B["Read $schema"]
B --> C["Extract @context\nfrom schema"]
C --> D["Valid JSON-LD"]
- Read the
$schemaURL from the data (or infer it from@type) - Look up the embedded
@contextfor that schema - Inject the context into the data, which is now valid JSON-LD
Python Example¶
from iscc_schema import recover_context
# Plain JSON data without @context
data = {
"$schema": "http://purl.org/iscc/schema/0.8.0.json",
"iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
"name": "The Never Ending Story",
}
# Recover the JSON-LD context from the schema reference
result = recover_context(data)
# Result now includes @context with semantic mappings
assert "@context" in result
assert result["name"] == "The Never Ending Story"
The function also works with @type when $schema is absent:
from iscc_schema import recover_context
data = {"@type": "ISBN", "isbn": "9789295055124"}
result = recover_context(data)
# @context is resolved from the ISBN schema
Language-Agnostic Approach¶
Outside Python, the recovery process is simple:
- Fetch the JSON Schema from the
$schemaURL - Read the
@contextproperty from the schema'spropertiessection - Merge the context value into the data object
schema = HTTP_GET(data["$schema"])
context = schema["properties"]["@context"]["default"]
data["@context"] = context
// data is now valid JSON-LD
Benefits¶
Data stays lean because there is no need to embed the full context in every object. Any consumer can still reconstruct JSON-LD on demand. The schema does triple duty: validation, semantics, and documentation in one file.
Schema Categories¶
ISCC Metadata¶
IsccMeta is the core metadata model. All fields are optional, so it works for anything from
minimal identification (just an iscc field) to full content descriptions with rights, technical
properties, and cryptographic declarations.
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
description="a 1984 fantasy film co-written and directed by Wolfgang Petersen",
)
See the ISCC Metadata schema reference for all available fields.
Seed Metadata¶
Seed metadata provides industry-specific input for Meta-Code generation (IEP-0002). Unlike ISCC Metadata, seed schemas have strict required fields to ensure interoperable content fingerprinting across platforms.
ISBN: book metadata (ISBN, title, publisher, language, etc.)ISRC: sound recording metadata (ISRC, artist, track title, duration, etc.)STM: scholarly metadata for DOI-identified scientific/technical/medical works (DOI, title, publisher, pubyear, etc.)
from iscc_schema import ISBN
seed = ISBN(
isbn="9789295055124",
productform="EA",
title="The Never Ending Story",
language="eng",
imprint="Penguin Classics",
publisher="Penguin Random House",
country="US",
pubdate="20240214",
)
See the ISBN, ISRC, and STM schema references.
Service Metadata¶
Service metadata covers use-case-specific schemas served by ISCC registries and discoverable through ISCC gateways.
TDM: machine-readable text and data mining reservation signalsGenAI: generative AI disclosure signals for content transparencyIdentifiers: a registry response listing the external identifiers associated with an asset
Service metadata can be used standalone or embedded as nested objects in IsccMeta. Typed
identifier objects are accepted by IsccMeta.identifier and by the Identifiers.identifier list:
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
identifier=[
{"scheme": "iswc", "code": "T-034.524.680-1", "scope": "work", "primary": True},
{"scheme": "isrc", "code": "USRC17607839", "scope": "manifestation"},
],
tdm={"tdm_reservation": 1, "tdm_policy": "https://example.com/tdmrep-policy.json"},
)
Registries can serve the same identifier objects through the standalone Identifiers response:
from iscc_schema import Identifiers
record = Identifiers(
iscc="ISCC:MAACAJINXFXA2SQX",
identifier=[{"scheme": "doi", "code": "10.1234/example.2024.001", "scope": "work"}],
)
See the TDM, GenAI, and Identifiers schema references.
Protocol Schemas¶
Protocol schemas are ISCC Discovery Protocol wire records hosted and versioned here. Like seed
metadata, they default to compact JSON with a version-specific $schema as the sole version anchor
(the @context and @type are dropped and recovered on demand).
IsccNote: the permanent ISCC Declaration log record for ISCC-HUB timestamping and registration
See the ISCC Note schema reference.
Python Usage¶
Creating Metadata Objects¶
All models validate input on construction. Invalid data raises a ValidationError:
from iscc_schema import IsccMeta
# Valid: all fields are optional
meta = IsccMeta(iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY")
# Invalid: raises ValidationError
try:
IsccMeta(iscc="not-a-valid-iscc")
except Exception as e:
print(e)
Serialization Formats¶
All models support .dict(), .json(), and .jcs() serialization. Each method accepts an ld
parameter that controls whether JSON-LD fields (@context, @type) are included:
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
)
# Python dict, excludes unset fields by default
meta.dict()
# {'iscc': 'ISCC:KACY...', 'name': 'The Never Ending Story'}
# JSON string, includes schema defaults (@context, @type, $schema)
meta.json()
# '{"@context":"http://purl.org/iscc/context/0.8.0.jsonld","@type":"CreativeWork","$schema":"http://purl.org/iscc/schema/0.8.0.json",...}'
# JCS canonical bytes, deterministic serialization for hashing
meta.jcs()
# b'{"$schema":"http://purl.org/iscc/schema/0.8.0.json","@context":...}'
# Compact JSON without JSON-LD fields
meta.json(ld=False)
# '{"$schema":"http://purl.org/iscc/schema/0.8.0.json","iscc":"ISCC:KACY...","name":"The Never Ending Story"}'
Field names are automatically translated to their JSON-LD aliases in all serialization formats
(context_ → @context, type_ → @type, schema_ → $schema).
Serialization Defaults by Model Type¶
Different model types have different ld defaults to match their intended use:
| Model | Default ld |
Rationale |
|---|---|---|
IsccMeta |
True |
Core metadata, full JSON-LD for semantic interoperability |
ISBN, ISRC, STM |
False |
Seed input for Meta-Code generation (IEP-0002), compact JSON with $schema |
TDM, GenAI, Identifiers |
True |
Service metadata served by registries, full JSON-LD for discovery |
Identifier |
No JSON-LD wrapper fields | Bare identifier item used inside IsccMeta.identifier and Identifiers.identifier |
IsccNote |
False |
Protocol wire record, compact JSON with version-specific $schema |
Seed metadata defaults to compact JSON because IEP-0002 accepts plain application/json and
the $schema reference makes data self-describing - any consumer can recover the full JSON-LD
context on demand:
from iscc_schema import ISBN
seed = ISBN(
isbn="9789295055124",
productform="EA",
title="The Never Ending Story",
language="eng",
imprint="Penguin Classics",
publisher="Penguin Random House",
country="US",
pubdate="20240214",
)
# Compact by default (ld=False)
seed.json()
# '{"$schema":"http://purl.org/iscc/schema/isbn-0.8.0.json","isbn":"9789295055124",...}'
# Full JSON-LD when needed
seed.json(ld=True)
# '{"@context":"http://purl.org/iscc/context/0.8.0.jsonld","@type":"ISBN","$schema":...}'
Strict Validation¶
Most schema models use extra="forbid", so passing unrecognized fields raises a ValidationError.
This is intentional: iscc-schema defines a standard, and strictness catches typos early. It also
matches JSON-LD semantics, where extra fields without @context mappings would be meaningless to
processors. Service metadata objects are the deliberate exception — TDM, GenAI,
Identifiers, their nested identifier items, and the inline service objects in IsccMeta set
extra="allow" to preserve forward-compatible service signals across versions.
Downstream consumers who need flexibility can subclass with a one-line override:
from pydantic import ConfigDict
from iscc_schema import IsccMeta
class FlexibleIsccMeta(IsccMeta):
model_config = ConfigDict(extra="allow")
# Additional fields are preserved through serialization
meta = FlexibleIsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
custom_field="custom value",
)
meta.dict()
# {'iscc': 'ISCC:KACY...', 'custom_field': 'custom value'}
Using JSON Schema Directly¶
The published JSON Schema works with any JSON Schema validator in any language.
Python (using jsonschema):
import json
import jsonschema
import urllib.request
schema_url = "http://purl.org/iscc/schema"
schema = json.loads(urllib.request.urlopen(schema_url).read())
data = {
"iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
"name": "The Never Ending Story",
}
jsonschema.validate(data, schema) # raises on invalid data
JavaScript (using ajv):
import Ajv from "ajv";
const schema = await fetch("http://purl.org/iscc/schema").then(r => r.json());
const ajv = new Ajv();
const validate = ajv.compile(schema);
const data = {
iscc: "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name: "The Never Ending Story",
};
if (!validate(data)) {
console.error(validate.errors);
}