Skip to content

RFC-016: Collection Operations

Status: Proposed Date: 2026-04-07 Authors: Anne Schuth

Context

Dutch law frequently reasons about collections of variable length: children in a household, employment periods, registrations, household members. The current v0.5.1 operation set has no way to iterate over a collection and apply per-element logic.

Real examples from Dutch law

Kindgebonden budget (artikel 2 WKB). The leeftijdstoeslag depends on each child's age: €703 for children aged 12-15, €936 for children aged 16-17. The law text reads: "Voor een kind dat 12 jaar of ouder is, maar jonger is dan 16 jaar bedraagt de verhoging van het kindgebonden budget € 703." The number of children varies per household.

Wet BRP / Huurtoeslag. Counting household members for income thresholds. Each member's income contribution depends on their age (above or below 21). The law says "medebewoners" without specifying a maximum.

Burgerlijk Wetboek. Filtering active registrations (curatele, bewind, mentorschap, executeurschap, volmacht) from a registry. Whether a registration is "actief" is a legal determination that depends on status fields, not a data-layer concern.

AWB bezwaar/beroep. Counting relevant procedural events (submissions, decisions) from a case history to determine whether deadlines have passed or rights have been exercised.

The alternative without iteration

Without a collection operation, the law author must pre-aggregate data in the data source layer:

yaml
# Pre-aggregated: data source provides category counts
- output: leeftijdstoeslagen
  operation: ADD
  values:
    - operation: MULTIPLY
      values: [$aantal_kinderen_12_15, $extra_12_15_jaar]
    - operation: MULTIPLY
      values: [$aantal_kinderen_16_17, $extra_16_17_jaar]

This works for simple sums-by-category. But it pushes the legal thresholds (12, 16, 17) into the data source layer. The data source must know what "12 jaar of ouder" means in the context of this specific law, which is exactly what regelrecht aims to avoid.

For filter-and-transform patterns (e.g., selecting active registrations), pre-aggregation requires the data source to understand legal concepts like "actief bewind". That couples the data layer to legal semantics.

Decision

Add a FOREACH operation to the schema and engine. FOREACH iterates over a collection, evaluates an expression per element with the element bound to a local variable, and optionally aggregates results.

YAML syntax

yaml
operation: FOREACH
collection: $kinderen_leeftijden     # array to iterate over
as: kind                             # local variable name (optional, defaults to "item")
body:                                # expression evaluated per element
  operation: IF
  cases:
    - when:
        operation: GREATER_THAN_OR_EQUAL
        subject: $kind
        value: 16
      then: $extra_16_17_jaar
    - when:
        operation: GREATER_THAN_OR_EQUAL
        subject: $kind
        value: 12
      then: $extra_12_15_jaar
  default: 0
combine: ADD                         # aggregation (optional)

With optional filter:

yaml
operation: FOREACH
collection: $curatele_registraties
as: registratie
filter:                              # skip elements where this evaluates to false
  operation: EQUALS
  subject: $registratie.status
  value: ACTIEF
body: $registratie

Counting events (AWB bezwaar):

yaml
# Count the number of objection submissions in the event history
operation: FOREACH
collection: $gebeurtenissen
as: event
filter:
  operation: EQUALS
  subject: $event.event_type
  value: BEZWAAR_INGEDIEND
body: 1
combine: ADD

Household income aggregation (huurtoeslag):

yaml
# Sum income contributions, with different rules per age group
operation: FOREACH
collection: $huishoudleden
as: bewoner
body:
  operation: IF
  cases:
    - when:
        operation: GREATER_THAN_OR_EQUAL
        subject: $bewoner.leeftijd
        value: 21
      then: $bewoner.inkomen
  default:
    operation: SUBTRACT
    values:
      - $bewoner.inkomen
      - $kind_vrijstelling
combine: ADD

Property naming rationale

FOREACH introduces properties that don't exist in other operations. The names are chosen to be distinct from existing property semantics:

PropertyWhy this name
collectionDistinct from subject (used for comparisons) and values (used for arithmetic). Describes what it is: the collection to iterate.
bodyDistinct from value (used for comparison target and action assignment). Describes what it is: the expression body to evaluate per element.
asStandard iteration variable binding, familiar from SQL and template languages.
filterDistinct from conditions (used for AND/OR). Describes intent: filtering elements.
combineDescribes intent: combining per-element results into a single value.

Variable binding with as

FOREACH is the only operation that introduces a new variable name into scope. This is a new concept in the schema: all other operations reference existing variables, none define them.

The as parameter names a local variable that exists only within the body and filter expressions of that FOREACH. It shadows any outer variable with the same name. When as is omitted, the default name is item. The default item is chosen as a neutral, language-independent term that does not collide with common domain variable names (unlike element which could conflict with XML-related fields, or current which suggests temporal context).

The filter expression runs in the child scope where the as variable is already bound. This means the filter can access element properties: $registratie.status works because $registratie is the current element.

Nested FOREACH scoping: Each FOREACH creates an independent child scope. The scoping rules for nested FOREACH follow from step 1 above: collection is always evaluated in the scope where its FOREACH appears, before the child context is created. For nested FOREACH, this means:

  1. The outer FOREACH evaluates collection: $households in the top-level scope. For each household, a child scope is created with $household bound.
  2. The inner FOREACH appears inside the outer FOREACH's body, so it executes in the outer child scope. Its collection: $household.members is evaluated there, where $household is available.
  3. The inner FOREACH then creates its own child scope (empty locals) and binds $member. The inner body and filter run in this inner child scope, where $member is visible but $household is not (it lives in the parent scope, not the child's locals).

The collection expression is the mechanism through which outer variables are accessed at the boundary between scopes. There is no other way to pass outer variables into an inner FOREACH's child scope.

yaml
# Nested: outer $household, inner $member
operation: FOREACH
collection: $households
as: household
body:
  operation: FOREACH
  collection: $household.members     # evaluated in outer child scope → $household is available
  as: member
  body: $member.income               # evaluated in inner child scope → sees $member, not $household
  combine: ADD
combine: ADD

If an inner FOREACH uses the same as name as an outer one, the inner binding shadows the outer within its body. To access both, use different as names.

Schema definition

json
"foreachOperation": {
  "type": "object",
  "required": ["operation", "collection", "body"],
  "additionalProperties": false,
  "properties": {
    "operation": { "const": "FOREACH" },
    "collection": {
      "$ref": "#/definitions/operationValue",
      "description": "Expression that evaluates to an array."
    },
    "as": {
      "type": "string",
      "pattern": "^[a-z_][a-z0-9_]*$",
      "description": "Local variable name bound to the current element. Defaults to 'item'."
    },
    "body": {
      "$ref": "#/definitions/operationValue",
      "description": "Expression evaluated for each element."
    },
    "filter": {
      "$ref": "#/definitions/operationValue",
      "description": "Boolean expression evaluated in the child scope. Elements where this evaluates to false are skipped."
    },
    "combine": {
      "type": "string",
      "enum": ["ADD", "OR", "AND", "MIN", "MAX"],
      "description": "Aggregation applied to collected results. When omitted, results are returned as an array."
    },
    "legal_basis": { "$ref": "#/definitions/legalBasis" }
  }
}

Semantics

  1. Evaluate collection in the current scope (the scope where the FOREACH operation appears) to get an array. If the result is not an array, wrap it in a single-element array. If null, treat as empty array.
  2. For each element in the array: a. Create a child execution context (isolated local scope, empty locals). b. Bind the element to the local variable named by as (default: item). c. If filter is present, evaluate it in the child context. If the result is falsy, skip this element. d. Evaluate body in the child context. Collect the result.
  3. If combine is specified, apply the aggregation to collected results and return a single value.
  4. If combine is omitted, return the collected results as an array.

Error handling

If body produces an error for an element, the FOREACH operation propagates the error immediately. Partial results are not returned. Rationale: legal computations must be complete - a partial sum over "some children" is not a valid legal determination.

If filter produces an error, the same rule applies: the error propagates and FOREACH fails.

If any element produces Value::Untranslatable (per RFC-012), the combined result is Value::Untranslatable. Untranslatable taints the entire collection result, because a partial determination that silently drops untranslatable elements would be misleading.

If filter itself evaluates to Untranslatable, the FOREACH propagates the untranslatable immediately (same as body). If filter evaluates to Null (e.g., a referenced property is absent), the FOREACH also propagates Null immediately, because an unknown filter result means the engine cannot determine whether the element belongs in the collection. Elements whose filter evaluates to a definitive false are skipped and do not contribute to untranslatable or null detection.

Dot notation for element properties

When iterating over arrays of objects, dot notation accesses properties:

yaml
collection: $curatele_registraties   # [{status: "ACTIEF", bsn_curator: "123"}, ...]
as: reg
body: $reg.bsn_curator               # accesses the bsn_curator property

This uses existing dot notation support in variable resolution.

Object field flattening

When iterating over arrays of objects, the object's fields are injected as local variables alongside the as binding. This allows $status as a shorthand for $reg.status. Existing law YAML files use this pattern extensively, so the engine supports both forms.

Note: flattened field names can shadow outer-scope variables if they collide. Law authors should use distinct as names and prefer dot notation ($reg.status) when clarity matters.

Combine operations

CombineDescriptionEmpty collection
ADDSum numeric results (polymorphic: concatenates strings/arrays per RFC-007)0
ORLogical: any result truthyfalse
ANDLogical: all results truthytrue
MINMinimum valuenull
MAXMaximum valuenull
(omitted)Collect results as array[]

Why only these five combiners? The combine operations map to meaningful legal aggregation patterns:

  • ADD: "het totaal van alle bedragen" (the total of all amounts)
  • OR: "indien ten minste een van de voorwaarden is vervuld" (if at least one condition is met)
  • AND: "indien aan alle voorwaarden is voldaan" (if all conditions are met)
  • MIN/MAX: "het laagste/hoogste van de bedragen" (the lowest/highest of the amounts)

SUBTRACT, MULTIPLY, and DIVIDE are excluded because they are not associative over collections in a meaningful legal sense. Subtracting a list of values is ambiguous (from what?). Multiplying a list of values has no common legal pattern. If a specific law needs such an aggregation, it can be expressed by collecting results as an array (no combine) and then applying the arithmetic operation to the array.

Empty collection semantics: ADD returns 0 (additive identity), OR returns false, and AND returns true (standard logical identities). MIN and MAX return null because there is no meaningful minimum or maximum of nothing - the caller must handle this case. When combine is omitted, an empty collection produces an empty array [].

Note: ADD on an empty collection always returns 0 (integer zero), regardless of what type the body expression would have produced. Since the collection is empty, no body expression runs and the engine cannot infer the intended type. If the caller expects a string or array result from an empty collection, it should handle the empty case explicitly (e.g., with an IF guard before the FOREACH).

Note: ADD is polymorphic per RFC-007. When all results are strings, ADD concatenates them. When all results are arrays, ADD flattens them. This covers string-building use cases (e.g., assembling a list of names) without a separate CONCAT combiner.

Security constraints

  • Maximum iteration count: MAX_ARRAY_SIZE (existing engine config, default 1000). If the collection exceeds this, the engine returns an error.
  • Maximum nesting depth: FOREACH increments depth for recursive evaluation, bounded by MAX_OPERATION_DEPTH.
  • All collections originate from finite data sources. The schema does not support generators or lazy sequences.

Why

Benefits

Legal logic stays in law YAML. Age thresholds, status checks, and permission rules are legal decisions. The data source provides raw facts (list of children with birth dates); the law determines what to do with them.

Matches legal language. Legislators write "voor elk kind", "alle actieve registraties", "medebewoners van 21 jaar of ouder". These are not separate filter-map-reduce steps in legal text - they are single clauses that combine selection and transformation. FOREACH with filter and combine maps to this integrated phrasing. Splitting into separate MAP, FILTER, REDUCE operations would force a decomposition that the law text does not make.

Engine infrastructure exists. The engine already has child context creation, local variable binding, and scoped variable resolution. The execution machinery is in place; only the operation dispatch is missing.

Concrete use cases. Multiple Dutch laws across toeslagen, BW delegaties, AWB procedures, and BRP household rules require per-element evaluation over variable-length collections. These are not hypothetical needs.

Tradeoffs

Variable binding is a new concept. Every other operation in the schema is purely referential - it reads existing variables but never creates them. as introduces a definition point. This makes FOREACH fundamentally different from arithmetic or logical operations.

Non-termination risk. Mitigated by MAX_ARRAY_SIZE (collection size limit) and MAX_OPERATION_DEPTH (nesting limit). Both are existing engine configuration values.

Pre-aggregation works for simple cases. When the pattern is purely "count items in categories," pre-aggregation in the data source is simpler. FOREACH is needed when per-element logic involves legal conditions, or when the output is a transformed collection rather than a single aggregate.

Alternatives Considered

Pre-aggregation in data sources. Push all counting and filtering to the data layer. Rejected: this works for simple sums but moves legal conditions (age thresholds, status definitions) out of law YAML. The boundary between data and law becomes unclear.

Fixed maximum with unrolled operations. Generate N branches for up to N items. Rejected: arbitrary limits, verbose YAML, does not handle filter-and-transform patterns, and breaks when the real count exceeds N.

Separate MAP, FILTER, REDUCE operations. Three operations following functional programming conventions. Rejected: Dutch legal text does not decompose collection logic into separate functional steps. A clause like "de som van de bedragen voor elk kind dat 12 jaar of ouder is" combines filtering (12 jaar of ouder), transformation (het bedrag), and aggregation (de som) in a single sentence. Three operations would require intermediate outputs (filtered_children, child_amounts, total) that exist nowhere in the law. FOREACH with filter and combine keeps the YAML close to the legal text.

No iteration, restructure all laws. Accept that laws needing iteration must be restructured to avoid it. Rejected: this is possible for simple aggregation cases but not for filter-and-transform patterns. It also forces legal knowledge into the data layer, which conflicts with regelrecht's design principle of keeping legal logic in law YAML.

Implementation Notes

Engine changes:

  • Add ForEach variant to ActionOperation enum in article.rs with fields: collection, as_name, body, filter, combine
  • Add execute_foreach() in operations.rs:
    1. Evaluate collection to Value::Array
    2. For each element: ctx.create_child(), ctx.set_local(as_name, element), optionally evaluate filter, evaluate body
    3. Apply combine aggregation or return array
  • Error propagation: any element error aborts the entire FOREACH
  • Trace: add PathNodeType::ForEachIteration with element index for execution tracing
  • Untranslatable propagation: if any element produces Value::Untranslatable, the combined result is Value::Untranslatable (per RFC-012)

Schema changes:

  • Add foreachOperation to definitions in schema/v0.5.x/schema.json
  • Add FOREACH to operationType enum
  • Add foreachOperation to the operation oneOf discriminator

Conformance tests:

  • foreach_basic.json: iterate over number array, combine with ADD
  • foreach_filter.json: iterate with filter clause, verify skipped elements
  • foreach_objects.json: iterate over object array, access properties via dot notation
  • foreach_nested.json: nested FOREACH with independent scopes, verify outer variable accessible in inner collection but not inner body
  • foreach_empty.json: empty and null collection handling per combine type
  • foreach_no_combine.json: collect results as array (no combine)
  • foreach_string_combine.json: combine with ADD on string results (concatenation)
  • foreach_error.json: error in body propagates, partial results not returned

References