For Coding Agents¶
Dense, prescriptive reference for AI coding agents working on or integrating with iscc-schema.
Architecture Map¶
File Layout¶
| Path | Contents | Editable? |
|---|---|---|
iscc_schema/__init__.py |
Public API exports: IsccMeta, Signature, ISBN, ISRC, STM, TDM, GenAI, Identifier, Identifiers, IsccNote, recover_context |
Yes |
iscc_schema/base.py |
Custom BaseModel — serialization (dict, json, jcs), declared-field empty-value coercion |
Yes |
iscc_schema/fields.py |
RFC 3986-compliant AnyUrl type with regex validation |
Yes |
iscc_schema/recovery.py |
recover_context() — reconstruct @context from $schema or @type |
Yes |
iscc_schema/aliases.json |
Maps @context→context_, @type→type_, $schema→schema_ |
Yes |
iscc_schema/schema.py |
Generated — main IsccMeta Pydantic v2 model |
No |
iscc_schema/generator.py |
Generated — OpenAPI models for ISCC Generator API | No |
iscc_schema/seed_isbn.py |
Generated — ISBN model |
No |
iscc_schema/seed_isrc.py |
Generated — ISRC model |
No |
iscc_schema/seed_stm.py |
Generated — STM model |
No |
iscc_schema/service_tdm.py |
Generated — TDM model |
No |
iscc_schema/service_genai.py |
Generated — GenAI model |
No |
iscc_schema/service_identifiers.py |
Generated — Identifiers service model and bare Identifier item model |
No |
iscc_schema/protocol_iscc_note.py |
Generated — IsccNote model |
No |
iscc_schema/contexts.py |
Generated — JSON-LD context mappings + TYPE_SCHEMAS dispatch |
No |
iscc_schema/models/*.yaml |
Source of truth — OpenAPI 3.1.0 schema definitions | Yes |
iscc_schema/models/iscc-all.yaml |
Composition manifest (allOf + $ref to individual schemas) |
Yes |
tools/build_code.py |
Code generation: YAML → datamodel-code-generator → Pydantic + post-processing | Yes |
tools/build_json_schema.py |
Flatten individual YAML schemas into merged iscc.json |
Yes |
tools/build_json_ld_context.py |
Generate JSON-LD context files from model metadata | Yes |
tools/build_terms.py |
Generate vocabulary markdown from x-iscc-context fields |
Yes |
tools/build_docs.py |
Generate documentation pages from YAML schemas + README/CHANGELOG | Yes |
tools/format_yaml.py |
Reformat YAML (2-space indent, 88 width, LF) | Yes |
Schema Composition¶
IsccMeta (iscc-all.yaml composes via allOf + $ref):
├── iscc-jsonld.yaml → @context, @type, $schema
├── iscc-minimal.yaml → iscc
├── iscc-basic.yaml → name, description, meta
├── iscc-embeddable.yaml → creator, license, credit, rights, acquire
├── iscc-extended.yaml → media_id, iscc_id, image, identifier, keywords, form, version, tdm, genai
├── iscc-technical.yaml → mode, filename, filesize, datasize, mediatype, duration, fps, width, height, created
├── iscc-crypto.yaml → tophash, metahash, datahash, nonce, signature
├── iscc-nft.yaml → external_url, animation_url, properties, attributes, nft
└── iscc-declaration.yaml → original, redirect, chain, wallet, credentials, verifications
Standalone (NOT in IsccMeta):
├── isbn.yaml → ISBN (seed metadata)
├── isrc.yaml → ISRC (seed metadata)
├── stm.yaml → STM (seed metadata; scholarly / DOI works)
├── tdm.yaml → TDM (service metadata; also inline in IsccMeta.tdm)
├── genai.yaml → GenAI (service metadata; also inline in IsccMeta.genai)
├── identifiers.yaml → Identifiers (service metadata; asset-to-identifiers response, includes Identifier items)
└── iscc-note.yaml → IsccNote (protocol record; ISCC Discovery Protocol declaration)
Import Dependency Flow¶
iscc_schema.__init__
→ iscc_schema.schema (generated) → iscc_schema.base → pydantic.BaseModel
→ iscc_schema.fields (AnyUrl)
→ iscc_schema.seed_isbn (generated) → iscc_schema.base
→ iscc_schema.seed_isrc (generated) → iscc_schema.base
→ iscc_schema.seed_stm (generated) → iscc_schema.base
→ iscc_schema.service_tdm (generated) → iscc_schema.base
→ iscc_schema.service_genai (generated) → iscc_schema.base
→ iscc_schema.service_identifiers (generated) → iscc_schema.base
→ iscc_schema.protocol_iscc_note (generated) → iscc_schema.base
→ iscc_schema.recovery → iscc_schema.contexts (generated)
Public API Exports¶
from iscc_schema import IsccMeta # Main metadata model (all fields)
from iscc_schema import Signature # Cryptographic signature (nested)
from iscc_schema import ISBN # Seed metadata
from iscc_schema import ISRC # Seed metadata
from iscc_schema import STM # Seed metadata (scholarly / DOI works)
from iscc_schema import TDM # Service metadata
from iscc_schema import GenAI # Service metadata
from iscc_schema import Identifier # Bare typed external identifier item
from iscc_schema import Identifiers # Asset-to-identifiers service metadata
from iscc_schema import IsccNote # Protocol record (ISCC Discovery Protocol)
from iscc_schema import recover_context # JSON-LD context recovery
Decision Dispatch¶
Which model to use?¶
| Use case | Model | Source YAML |
|---|---|---|
| General ISCC metadata | IsccMeta |
iscc-all.yaml (composed) |
| ISBN-based Meta-Code generation | ISBN |
isbn.yaml |
| ISRC-based Meta-Code generation | ISRC |
isrc.yaml |
| DOI / scholarly-work Meta-Code generation | STM |
stm.yaml |
| TDM reservation signals | TDM |
tdm.yaml |
| GenAI disclosure signals | GenAI |
genai.yaml |
| One typed external identifier | Identifier |
identifiers.yaml item schema |
| Asset-to-identifiers discovery response | Identifiers |
identifiers.yaml |
| ISCC Discovery Protocol declaration record | IsccNote |
iscc-note.yaml |
| API request/response models | generator.py models |
iscc-generator.yaml |
Which serialization method?¶
All methods accept an ld parameter (bool) to control JSON-LD field inclusion.
| Need | Method | Behavior |
|---|---|---|
| JSON-LD output | obj.json() or obj.json(ld=True) |
Includes @context, @type, $schema |
| Compact JSON | obj.json(ld=False) |
Only $schema + data fields |
| Dict for processing | obj.dict() |
exclude_none=True, exclude_unset=True, by_alias=True |
| Canonical bytes for signing | obj.jcs() |
JCS-canonicalized JSON bytes |
| Pydantic v2 native | obj.model_dump() / obj.model_dump_json() |
Used internally by dict() / json() |
Default ld value depends on model type:
| Model | Default ld |
Why |
|---|---|---|
IsccMeta |
True |
Core metadata, full JSON-LD |
ISBN, ISRC, STM |
False |
Seed input for Meta-Code generation |
TDM, GenAI, Identifiers |
True |
Service metadata for registry discovery |
Identifier |
No JSON-LD wrapper fields | Bare item model used inside identifier lists |
IsccNote |
False |
Protocol wire record, compact with version-specific $schema |
Which build command?¶
| Changed | Run |
|---|---|
| Any YAML schema | uv run poe all (full pipeline) |
| Only Python code (base.py, fields.py) | uv run pytest |
| Only build tool scripts | uv run poe all |
| Quick code regen only | uv run poe buildcode |
| Quick JSON Schema only | uv run poe buildschema |
Constraints and Invariants¶
Model Configuration¶
All models inherit from iscc_schema.base.BaseModel with:
extra="forbid"— unknown fields raiseValidationErrorby default; generated Service objects (TDM,GenAI,Identifiers), nested identifier items, and their inline forms inIsccMetaoverride this toextra="allow"for forward-compatible service signalsvalidate_assignment=True— assignment validates at runtimeuse_enum_values=True— enums serialize to string valuespopulate_by_name=True— accept bothcontext_and@contextfrom_attributes=True— models can be built from objects with matching attributes (ORM-style)
Empty String Coercion¶
The @model_validator(mode="before") in base.py converts falsy values on declared schema fields to None, except empty lists on required list fields, which are preserved so list constraints such as minItems can run. Combined with exclude_none=True in serialization, empty optional declared values are silently dropped. Unknown extension fields are preserved as supplied, including 0.0, [], {}, "", and False.
URL Validation¶
AnyUrl in fields.py validates against RFC 3986. Accepts any scheme (http, https, ipfs, data, urn). Empty string → None via the empty-string coercion.
Field Aliases¶
Three fields use aliases because @ and $ are invalid in Python identifiers:
| JSON property | Python field | Alias |
|---|---|---|
@context |
context_ |
@context |
@type |
type_ |
@type |
$schema |
schema_ |
$schema |
Extension Fields¶
Authored in the YAML source:
| Extension | Values | Purpose |
|---|---|---|
x-iscc-context |
IRI string | JSON-LD predicate IRI for the property |
x-iscc-enum-context |
{token: IRI} map |
Per-enum-value class IRIs for the JSON-LD context (e.g. stm.yaml resource_type); context-only, kept out of the Pydantic model |
x-iscc-status |
stable / draft |
Field maturity indicator |
x-iscc-category |
Category name (e.g. nft) |
Groups related fields |
x-iscc-standard |
Standard name (e.g. ISO 24138:2024) |
Source standard reference |
x-iscc-schema-doc |
Text | Original schema.org definition |
x-iscc-embed |
Text | Media embedding guidance |
x-iscc-example-titles |
List of strings | Titles for a schema's multiple examples in generated docs (e.g. iscc-note.yaml) |
x-iscc-jsonld is generated, not authored — build_json_schema.py writes it into each standalone schema JSON to document the compact→JSON-LD upgrade path.
Generated File Post-Processing¶
build_code.py applies two patches after code generation:
- Import swap:
pydantic.BaseModel→iscc_schema.base.BaseModel,pydantic.AnyUrl→iscc_schema.fields.AnyUrl - URL versioning:
http://purl.org/iscc/context→http://purl.org/iscc/context/{version}.jsonld,http://purl.org/iscc/schema→http://purl.org/iscc/schema/{version}.json
Side Effects Catalog¶
| Method / Function | Effect |
|---|---|
IsccMeta(...) |
Validates all fields, coerces empty declared-field values to None |
meta.dict() |
Pure — returns new dict, no mutation |
meta.json() |
Pure — returns JSON string, no mutation |
meta.jcs() |
Pure — returns canonical bytes, no mutation |
recover_context(data) |
Pure — returns a new dict with @context prepended (input unchanged) |
poe buildcode |
Overwrites schema.py, generator.py, seed/service modules |
poe buildschema |
Overwrites docs/schema/*.json, iscc_schema/contexts.py |
poe buildcontext |
Overwrites docs/context/*.jsonld |
poe buildterms |
Overwrites docs/includes/terms-*.md |
poe builddocs |
Overwrites docs/index.md, docs/changelog.md, docs/schema/*.md, docs/context/index.md |
Task Recipes¶
Add a new field to IsccMeta¶
- Choose the appropriate YAML schema file in
iscc_schema/models/based on the field's category - Add the property with type, description, and
x-iscc-*extension fields - Add the field name to the
prioritylist intools/build_json_schema.pyfor property ordering - Run
uv run poe allto regenerate everything and run tests - Verify the field appears in
schema.py,docs/schema/iscc.json, and vocabulary docs
Add a new standalone schema (seed, service, or protocol)¶
Standalone schemas come in three categories: seed (compact JSON, for Meta-Code generation),
service (JSON-LD by default, for registry discovery), and protocol (compact JSON with a
version-specific $schema, for ISCC Discovery Protocol records like IsccNote).
- Create
iscc_schema/models/{name}.yamlwith title, type, properties - Add to
SEED_SCHEMAS,SERVICE_SCHEMAS, orPROTOCOL_SCHEMASintools/build_code.py - Add to the matching list in
tools/build_json_schema.py - Add to the matching list in
tools/build_json_ld_context.py(service/protocol schemas also need their@typeIRI in the hardcoded context dict, e.g."IsccNote": ".../terms/#IsccNote") - Add to the matching list in
tools/build_terms.py - Add to the matching list and to
STANDALONE_METAintools/build_docs.py - Export the new model from
iscc_schema/__init__.py - Add the schema's doc page to the nav in
zensical.tomland toPAGESintools/gen_llms_full.py - Run
uv run poe all
Recover JSON-LD context from plain JSON¶
from iscc_schema import recover_context
data = {"iscc": "ISCC:KACYPXW445FTYNJ3", "name": "Example"}
data_with_context = recover_context(data)
# Adds @context from SCHEMA_CONTEXTS[SCHEMA_ISCC]
Bypass validation (downstream pattern)¶
meta = IsccMeta.model_construct(iscc="ISCC:...", name="...")
# No validation, no coercion — use only for trusted internal data
Change Playbook¶
If modifying a YAML schema property¶
- Run
uv run poe all— regenerates code, JSON Schema, context, docs, runs tests - Check generated
schema.pyfor correct field type and alias - Check
docs/schema/iscc.jsonfor correct property definition - If property has
x-iscc-context: checkdocs/context/iscc.jsonldandcontexts.py
If modifying base.py¶
- Run
uv run pytest— tests cover serialization, coercion, JCS - Check that
dict(),json(),jcs()defaults still match expectations - Downstream impact: iscc-sdk subclasses
IsccMetaand uses.construct()extensively
If modifying fields.py (AnyUrl)¶
- Run
uv run pytest— tests cover URL validation patterns - Check that valid URIs (http, ipfs, data, urn) still accepted
- Check that empty string → None coercion still works
If modifying build_code.py¶
- Run
uv run poe buildcode && uv run pytest - Verify post-processing patches applied correctly in generated files
- Check import statements in
schema.pyreferenceiscc_schema.base.BaseModel
If modifying build_json_schema.py¶
- Run
uv run poe buildschema - Verify
docs/schema/iscc.jsonproperty order matchesprioritylist - Verify
contexts.pyregenerated with correct mappings - Verify
@contextproperty accepts both URI string and inline object
If adding a new x-iscc-* extension field¶
- Update all
tools/build_*.pyscripts that inspect extension fields - Update
tools/build_docs.pyto render the new extension in documentation - Run
uv run poe all
Common Mistakes¶
NEVER edit schema.py, generator.py, seed_*.py, service_*.py, or contexts.py directly — they are overwritten by poe buildcode / poe buildschema. ALWAYS edit the source YAML schemas and run the build pipeline.
NEVER add a new YAML schema field without adding it to the priority list in tools/build_json_schema.py. Fields missing from this list end up unsorted at the bottom of the generated JSON Schema.
NEVER use pydantic.BaseModel or pydantic.AnyUrl in hand-written code that extends iscc-schema. ALWAYS use iscc_schema.base.BaseModel and iscc_schema.fields.AnyUrl — they provide JSON-LD serialization, empty-string coercion, and RFC 3986 validation.
NEVER assume fields are present in serialized output. exclude_none=True and exclude_unset=True mean only explicitly set, non-None fields appear. ALWAYS check for key existence when consuming IsccMeta dicts.
NEVER pass by_alias=False to dict() or json() unless you specifically need Python field names (context_, type_, schema_). The default by_alias=True produces JSON-LD compatible output with @context, @type, $schema.
NEVER add a standalone schema without updating ALL five build scripts (build_code.py, build_json_schema.py, build_json_ld_context.py, build_terms.py, build_docs.py) plus public exports and docs navigation. Missing any one causes incomplete output.