Data normalisation
Registries differ widely in how they expose data such as names, addresses, and dates.
For addresses, some return a single-line string, while others provide the address as multiple unrelated components.
For dates, some registries provide only a date without time or timezone, while others include partial timestamps or country-specific calendars.
Our engine standardises all of this into a consistent, developer-friendly structure while always preserving the registry's raw information.
This page explains how addresses, dates, geodata, and inferred components are built and how they interact with the merging logic.
Address structure
Every address returned by our engine includes:
| Field | Description |
|---|---|
kind | Address type (e.g. registered, headquarters) |
value | Raw address value from the registry |
normalized | Canonical formatted address and multi-line form |
components | Inferred structured breakdown of address parts |
geo | Best-effort geolocation data |
since / until | Validity bounds of the address, if known |
Only the top-level value and the raw date inside since / until are considered the registry's original data.
Raw address
The top-level value field is always:
- a single-line value
- based directly on the source registry
- trimmed and cleaned
- normalised in surface form (title case, consistent spacing)
- assembled verbatim when registries provide multiple components
This ensures that even fragmented registry data is represented clearly, without synthesis or interpretation.
{ "value": "Postboks 980 Skøyen, OSLO 0240, OSLO, Norway" }This field is the ground truth for auditability.
Normalised address
The normalized block contains:
| Field | Description |
|---|---|
address | Canonical formatted address |
lines[] | Multi-line representation of the address |
Normalisation may reorder components, correct casing, harmonise country names, and restructure lines for clarity.
The goal is to produce a consistent, human-friendly format that can be used directly in UIs or for comparison, while keeping the raw value intact for reference.
Address inferred components
The components object contains a structured breakdown:
| Field | Description |
|---|---|
careOf | Care of / attention to |
building | Building name, apartment block, entrance, etc. |
houseNumber | House/building number |
street | Street name |
poBox | PO Box number |
postcode | Postal code |
locality | City, town, village |
localityCode | City code (if applicable) |
district | District, borough, municipality |
districtCode | District code (if applicable) |
state | State, province, region |
stateCode | State code (if applicable) |
country | Full country name |
countryCode | ISO 3166-1 alpha-2 country code |
other | Any remaining address fragments |
These components are inferred, unless the registry provides them explicitly. All values are either strings or null if not available.
They exist to make integration easier when a developer needs to target only one part of an address, e.g., extracting a postcode for form routing even if the registry doesn't provide one explicitly.
How inference works
- Parsing heuristics detect house numbers, PO Boxes, etc.
- Subdivision names and codes come from our geocoder, if not provided by the registry
- Country fields are harmonised and never taken verbatim from user input
- Missing components are returned as
null - All inferred values remain non-authoritative, and the raw address is always preserved
This approach gives developers a reliable surface while keeping registry truth intact.
Geolocation
The geo object is a best-effort enrichment powered by our own geocoding engine and external data partners.
Registries never provide coordinates. All location data is inferred from the address or postcode.
| Field | Description | Example |
|---|---|---|
precision | Granularity of the geolocation | postcode |
lat | Latitude in decimal format | 59.9139 |
lon | Longitude in decimal format | 10.7522 |
latDMM | Latitude in degrees + decimal minutes format | 59°54.834'N |
lonDMM | Longitude in degrees + decimal minutes format | 10°45.132'E |
latDMS | Latitude in degrees, minutes, seconds format | 59°54'50.04"N |
lonDMS | Longitude in degrees, minutes, seconds format | 10°45'7.92"E |
Precision levels
The precision field indicates how specific the geolocation is:
| Precision | Description |
|---|---|
rooftop | Exact building-level match |
street | Matched at street level |
locality | Town or city centroid |
postcode | Postal code centroid (most common) |
state | Regional centroid |
country | Country-level centroid |
unknown | No meaningful coordinates could be resolved |
Most registries provide only loose address fragments, so rooftop accuracy is uncommon.
Postcode-level precision is the typical resolution.
Coordinate formats
Regardless of precision, all available coordinates are returned in decimal, DMM, and DMS formats. Missing coordinates are represented as null.
Geolocation is never meant to supersede the raw address, but to complement it with spatial data for mapping, distance calculations, or regional logic.
Real-world example
Here you can find a representative example of a normalised address from Norway. The raw registry value is preserved in value, while the normalized block provides a clean, multi-line format. The components block breaks down the address into structured parts, and the geo block provides inferred coordinates with postcode-level precision.
Dates
Every date returned by our engine includes:
| Field | Description | Example |
|---|---|---|
date | Raw date string from the registry | 2020-05-15 |
normalized | Normalised timestamp and timezone data | See below |
Raw date (date)
- Always comes from the registry
- Always normalised to
YYYY-MM-DDformat - Contains no time or timezone
- Is the authoritative ground truth
Normalised timestamp (normalized)
Since often registries do not provide timestamps or timezones, our engine infers these values to produce a consistent temporal representation.
| Field | Description | Example |
|---|---|---|
utc | UTC timestamp in ISO 8601 format | 2020-05-14T23:00:00+00:00 |
local | Local timestamp in ISO 8601 format | 2020-05-15T01:00:00+02:00 |
offset | UTC offset in ±HH:MM format | +02:00 |
tz | Timezone name (IANA format) | Europe/Oslo |
ms | Millisecond timestamp (Unix epoch) | 1589506800000 |
- Time is assumed to be
00:00local time, unless provided by the registry - Timezone is inferred from the registry's country
- Historical DST rules are applied
This produces clean, predictable temporal values for integration.
Recommended usage
- Use
valueanddate(raw) for audit or compliance-limited workflows - Use
componentswhen your UI or logic depends on specific elements (postcode, street, country) - Treat all inferred fields as conveniences, not authoritative data
- Use
normalized.utcornormalized.msfor sorting and consistent cross-country comparisons