Guide¶
Understanding ISCC Schemas¶
Digital content identification requires structured metadata: data that describes what a piece of content is, who created it, and how it may be used. Without a shared vocabulary, every platform invents its own metadata format, and interoperability breaks down.
ISCC schemas solve this by defining a single, machine-readable vocabulary for digital content metadata. They are part of the broader ISCC ecosystem, which provides content-derived identifiers standardized as ISO 24138:2024.
The iscc-schema package is the canonical source for these definitions. From YAML-based OpenAPI
3.1.0 source files, the build pipeline generates:
- JSON Schema for validating metadata objects
- JSON-LD Context for semantic mappings to established vocabularies (schema.org, Dublin Core)
- Python Models (Pydantic) for creating and validating metadata in Python
JSON Schema and JSON-LD¶
ISCC metadata uses two complementary standards that serve different needs:
JSON Schema is the primary interface for most developers. It defines field types, constraints, and defaults for validation — but it also carries human-readable field descriptions, examples, and documentation inline. This makes it self-documenting: a developer (or an AI agent) can read the schema and understand every field without consulting external resources.
JSON-LD Context maps compact field names like name or creator to global semantic URIs
(e.g., http://schema.org/name), enabling ISCC metadata to participate in the Linked Data web.
This is valuable for semantic interoperability across systems, though it requires dereferencing
URIs to access human-readable descriptions.
Both are generated from the same YAML source definitions, so they are always in sync. Since v0.5.0, JSON Schema files embed the JSON-LD context directly - one file provides validation rules, inline documentation, and semantic mappings.
For most use cases, plain JSON with a $schema reference is all you need. The schema handles
validation and documentation. When semantic interoperability matters, the embedded JSON-LD context
is always available - either carried in the data directly, or recovered from the schema on demand
(see Schema-Driven Context Recovery below).
Schema-Driven Context Recovery¶
iscc-schema v0.5.0 introduced schema-driven context recovery: reconstructing full JSON-LD from compact, plain JSON data.
Plain JSON is compact and easy to work with, but a field called name could mean anything without
semantic context. JSON-LD fixes this with an @context that maps fields to global identifiers -
but carrying the full context in every object adds bulk. Since the $schema field already points
to the JSON Schema describing the data, the schema can embed the @context too. Any consumer can
then recover the full JSON-LD context from the schema reference alone.
How It Works¶
flowchart LR
A["Plain JSON"] --> B["Read $schema"]
B --> C["Extract @context\nfrom schema"]
C --> D["Valid JSON-LD"]
- Read the
$schemaURL from the data (or infer it from@type) - Look up the embedded
@contextfor that schema - Inject the context into the data, which is now valid JSON-LD
Python Example¶
from iscc_schema import recover_context
# Plain JSON data without @context
data = {
"$schema": "http://purl.org/iscc/schema",
"iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
"name": "The Never Ending Story",
}
# Recover the JSON-LD context from the schema reference
result = recover_context(data)
# Result now includes @context with semantic mappings
assert "@context" in result
assert result["name"] == "The Never Ending Story"
The function also works with @type when $schema is absent:
from iscc_schema import recover_context
data = {"@type": "ISBN", "isbn": "9789295055124"}
result = recover_context(data)
# @context is resolved from the ISBN schema
Language-Agnostic Approach¶
Outside Python, the recovery process is simple:
- Fetch the JSON Schema from the
$schemaURL - Read the
@contextproperty from the schema'spropertiessection - Merge the context value into the data object
schema = HTTP_GET(data["$schema"])
context = schema["properties"]["@context"]["default"]
data["@context"] = context
// data is now valid JSON-LD
Benefits¶
Data stays lean because there is no need to embed the full context in every object. Any consumer can still reconstruct JSON-LD on demand. The schema does triple duty: validation, semantics, and documentation in one file.
Schema Categories¶
ISCC Metadata¶
IsccMeta is the core metadata model. All fields are optional, so it works for anything from
minimal identification (just an iscc field) to full content descriptions with rights, technical
properties, and cryptographic declarations.
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
description="a 1984 fantasy film co-written and directed by Wolfgang Petersen",
)
See the ISCC Metadata schema reference for all available fields.
Seed Metadata¶
Seed metadata provides industry-specific input for Meta-Code generation (IEP-0002). Unlike ISCC Metadata, seed schemas have strict required fields to ensure interoperable content fingerprinting across platforms.
ISBN: book metadata (ISBN, title, publisher, language, etc.)ISRC: sound recording metadata (ISRC, artist, track title, duration, etc.)
from iscc_schema import ISBN
seed = ISBN(
isbn="9789295055124",
title="The Never Ending Story",
language="eng",
publisher="Penguin Random House",
)
See the ISBN and ISRC schema references.
Service Metadata¶
Service metadata covers use-case-specific schemas served by ISCC registries and discoverable through ISCC gateways.
TDM: machine-readable text and data mining reservation signalsGenAI: generative AI disclosure signals for content transparency
Service metadata can be used standalone or embedded as nested objects in IsccMeta:
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
tdm={"train": "reserved", "inference": "open"},
)
See the TDM and GenAI schema references.
Python Usage¶
Creating Metadata Objects¶
All models validate input on construction. Invalid data raises a ValidationError:
from iscc_schema import IsccMeta
# Valid: all fields are optional
meta = IsccMeta(iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY")
# Invalid: raises ValidationError
try:
IsccMeta(iscc="not-a-valid-iscc")
except Exception as e:
print(e)
Serialization Formats¶
The models support three serialization methods:
from iscc_schema import IsccMeta
meta = IsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name="The Never Ending Story",
)
# Python dict, excludes unset fields by default
meta.dict()
# {'iscc': 'ISCC:KACY...', 'name': 'The Never Ending Story'}
# JSON string, includes schema defaults (@context, @type, $schema)
meta.json()
# '{"@context":"http://purl.org/iscc/context","@type":"CreativeWork",...}'
# JCS canonical bytes, deterministic serialization for hashing
meta.jcs()
# b'{"$schema":"http://purl.org/iscc/schema","@context":...}'
Field names are automatically translated to their JSON-LD aliases in all serialization formats
(context_ → @context, type_ → @type, schema_ → $schema).
Strict Validation¶
All schema models use extra="forbid", so passing unrecognized fields raises a ValidationError.
This is intentional: iscc-schema defines a standard, and strictness catches typos early. It also
matches JSON-LD semantics, where extra fields without @context mappings would be meaningless to
processors.
Downstream consumers who need flexibility can subclass with a one-line override:
from pydantic import ConfigDict
from iscc_schema import IsccMeta
class FlexibleIsccMeta(IsccMeta):
model_config = ConfigDict(extra="allow")
# Additional fields are preserved through serialization
meta = FlexibleIsccMeta(
iscc="ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
custom_field="custom value",
)
meta.dict()
# {'iscc': 'ISCC:KACY...', 'custom_field': 'custom value'}
Using JSON Schema Directly¶
The published JSON Schema works with any JSON Schema validator in any language.
Python (using jsonschema):
import json
import jsonschema
import urllib.request
schema_url = "http://purl.org/iscc/schema"
schema = json.loads(urllib.request.urlopen(schema_url).read())
data = {
"iscc": "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
"name": "The Never Ending Story",
}
jsonschema.validate(data, schema) # raises on invalid data
JavaScript (using ajv):
import Ajv from "ajv";
const schema = await fetch("http://purl.org/iscc/schema").then(r => r.json());
const ajv = new Ajv();
const validate = ajv.compile(schema);
const data = {
iscc: "ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2M5AEGQY",
name: "The Never Ending Story",
};
if (!validate(data)) {
console.error(validate.errors);
}