Why 731 Sensing-Enabled DOOH Screens Are Changing Programmatic Measurement
**TL;DR:** Between January 1 and May 11, 2026, 731 sensing-enabled DOOH screens fundamentally altered the physics of audience measurement. By analyzing frames at 30 FPS and calculating gaze vectors with a strict 25-degree cone tolerance, computer-vision hardware shifts the industry from panel-based probabilistic models to deterministic, per-impression telemetry. This analysis breaks down the mechanics of attention scoring, 1-second dwell resolution, and the custom segtax=600 schema that translates optical reality into actionable bidstream signals for measurement auditors and integration engineers.
## The Physics of Deterministic Measurement
For decades, the standard for measuring physical audiences relied on probabilistic panels. Legacy methodologies deployed by traditional auditing bodies utilized historical cellular data, manual traffic counts, and limited demographic sampling to estimate the volume of humans passing a given coordinate. These models generated a static multiplier—a theoretical number of impressions credited to every ad play, regardless of whether a human was actually present, looking at the screen, or entirely absent at that exact millisecond.
Between January 1 and May 11, 2026, our network telemetry captured a paradigm shift. Across exactly 731 sensing-enabled screens, the methodology of audience measurement transitioned from theoretical modeling to deterministic optical physics. This is not an extrapolation of historical traffic; it is the real-time, edge-computed reality of human presence, captured and quantified frame by frame.
The core of this transformation is high-frequency frame sampling. The optical sensors embedded within these 731 enclosures operate at a continuous 30 FPS (frames per second). This temporal resolution is critical. At 30 FPS, the system is not merely taking periodic snapshots to count heads; it is tracking the fluid dynamics of human movement through a physical space. When a pedestrian walks past a digital display, their trajectory, velocity, and orientation are captured across dozens, sometimes hundreds, of sequential frames.
This continuous temporal tracking enables the transition from simple face detection to complex gaze vectoring. Detecting a face in a spatial environment is a solved problem in computer vision. However, a detected face does not equal an impression. If a consumer is walking past a screen but looking at their smartphone, their face is present, but their attention is zero. To solve this, the sensing architecture utilizes a strict 25-degree gaze cone tolerance.
## Gaze Vectors and the 25-Degree Cone Tolerance
The mathematics of gaze tracking require mapping a three-dimensional head pose onto a two-dimensional screen plane. When the edge-inference module detects a face, it immediately isolates facial landmarks—specifically the pupillary distance, the bridge of the nose, and the corners of the mouth. By calculating the pitch, yaw, and roll of the head relative to the known focal length and physical offset of the camera lens, the system casts a geometric ray outward from the consumer's eyes.
This ray is not a single infinitesimally thin line, as human vision relies on peripheral awareness and rapid saccadic movements. Instead, the system projects a volumetric cone. Through rigorous calibration against [MRC standards](https://www.mrc-online.org/our-standards/) for viewability and attention, the hardware enforces a 25-degree gaze cone tolerance.
If the physical dimensions of the digital screen intersect with this 25-degree volumetric projection, the system registers a valid "look." If the consumer's head turns by 26 degrees—perhaps to look at a companion or a storefront—the screen falls outside the cone, and the attention metric is immediately paused. At 30 FPS, this calculation occurs every 33 milliseconds. The precision of this 25-degree gaze cone tolerance ensures that measurement auditors are evaluating actual visual engagement, rather than incidental physical proximity. It completely deprecates the legacy panel methodology where a consumer walking by with their back turned was counted as a monetizable impression.
## Attention Scoring and Dwell-Time Math
Once a valid gaze intersection is established, the telemetry must quantify the duration and quality of that engagement. This requires rigorous dwell-time math. Because the system tracks the consumer across sequential frames, dwell time is calculated by counting the contiguous frames where the 25-degree gaze cone intersects the screen, divided by the 30 FPS sampling rate.
For example, if a unique anonymous profile is tracked maintaining a valid gaze for 45 contiguous frames, the raw optical dwell time is 1.5 seconds. However, to standardize this telemetry for downstream measurement vendors and integration pipelines, the system applies a 1-second dwell seconds resolution. Using nearest-neighbor integer rounding, a raw dwell of 1.5 seconds is reported as 2 seconds, while a raw dwell of 1.4 seconds is reported as 1 second. This 1-second dwell seconds resolution prevents bidstream bloat while maintaining a highly accurate representation of human attention.
These standardized dwell times are then aggregated into 3 distinct attention buckets. These buckets serve as macro-indicators of cognitive load and message absorption:
1. **Low Attention (Glance):** Dwell times of 1 to 2 seconds. This bucket indicates that the consumer registered the screen's presence and processed the highest-level visual hierarchy (e.g., brand color, primary logo), but did not engage with the secondary copy.
2. **Medium Attention (Dwell):** Dwell times of 3 to 5 seconds. This bucket represents active reading or sustained observation. The consumer has committed cognitive resources to the creative payload.
3. **High Attention (Fixation):** Dwell times of 6 seconds or greater. This bucket is relatively rare in transit environments but highly valuable, indicating deep engagement, often correlated with complex dynamic creative or interactive utility.
By categorizing every deterministic impression into one of these 3 attention buckets, measurement frameworks can move beyond simple CPM (Cost Per Mille) models and begin evaluating CPmA (Cost Per Mille Attentive), fundamentally altering how the value of physical real estate is calculated.
## The Segtax=600 Schema: Translating Optical Reality
Capturing the physics of an impression is only the first half of the measurement equation; the second half is translating that physical reality into a standardized semantic payload that external systems can parse.
Historically, digital and mobile ecosystems have relied on the [IAB Tech Lab Audience Taxonomy](https://iabtechlab.com/standards/audience-taxonomy/). Within this standard, there are exactly 1558 IAB Audience Nodes classified under the `segtax=4` identifier. These nodes cover everything from "Auto Intenders" to "Household Income $100k+." However, these 1558 IAB Audience Nodes were designed for browser cookies and mobile device IDs. They are fundamentally probabilistic, derived from historical browsing behavior, and they completely lack the real-time physical context of a consumer standing in front of a digital screen.
To bridge this gap, the network utilizes a custom, purpose-built taxonomy specifically designed for computer-vision-driven out-of-home measurement: the `segtax=600` schema.
The `segtax=600` schema operates completely separately from the traditional buyer/seller waterfall. It is a measurement-vendor schema designed to enrich the bid request with deterministic, edge-computed context. The schema is divided into exactly 6 strictly defined classes, with each profile emitting up to 6 declared fields per profile.
### Class 1: Group Composition
Traditional digital measurement assumes a 1:1 ratio between a device and a user. Physical environments routinely violate this assumption. The Group Composition class quantifies the social context of the audience. Is the detected individual navigating the space solo? Are they part of a dyad (couple)? Are they moving in a densely packed peer group or a family unit? A consumer's receptivity to specific messaging—such as a quick-service restaurant family bundle versus a single-serve energy drink—is highly correlated with their group composition at the moment of exposure.
### Class 2: Intent Stage
By analyzing the velocity and trajectory of the consumer across the 30 FPS frame samples, the edge-inference module classifies their physical intent. A consumer moving at a rapid, sustained pace along a primary ingress/egress corridor is classified as "Transit/Commute." A consumer moving slowly, pausing frequently, and exhibiting erratic gaze vectors is classified as "Browsing." A consumer standing entirely stationary in a designated queue is classified as "Waiting/Transacting." This intent stage is a critical multiplier for the 3 attention buckets, as a 2-second glance during a rapid commute requires different creative optimization than a 2-second glance while waiting in line.
### Class 3: Attire Archetype
Clothing is a primary deterministic indicator of physical context and destination. The `segtax=600` schema utilizes computer vision to categorize the dominant clothing features into standardized buckets, detailed further in our breakdown of [audience archetypes](/research/state-of-dooh-2026/audience-archetypes/). Archetypes include "Business/Formal" (indicating corporate commuting or professional environments), "Activewear" (indicating proximity to fitness centers or recreational intent), "Uniform/Workwear," and "Casual." By aggregating these archetypes, measurement vendors can infer the localized mindset of the audience without relying on invasive mobile location tracking.
### Class 4: Activity Macro
Distinct from the Intent Stage, the Activity Macro classifies the immediate physical posture of the consumer. Are they walking, standing, sitting, or actively interacting with an object (e.g., eating, holding a smartphone, pushing a stroller)? This physical posture directly impacts the expected dwell time and the baseline probability of the screen intersecting the 25-degree gaze cone.
### Class 5: Ethnicity Coarse Bucket
To support diversity and inclusion measurement mandates for specific governmental and localized campaigns, the system utilizes highly aggregated, non-PII demographic estimators to output a coarse ethnicity bucket. This is strictly a macro-level measurement signal, processed entirely at the edge with zero image retention, designed solely to validate that campaign delivery aligns with census-level population distributions in the physical deployment area.
### Class 6: Engagement Narrative
This final class synthesizes the raw optical data into a semantic state. It categorizes the consumer's interaction with the display environment as "Ignoring" (present but gaze vector outside the cone), "Glancing" (transient intersection), "Reading" (sustained intersection with minimal head movement), or "Reacting" (sudden changes in head pose or dwell velocity in response to creative transitions).
Together, these 6 `segtax600` classes and the resulting 6 declared fields per profile provide a radically dense, deterministic dataset that completely eclipses the probabilistic guesswork of legacy panel models.
## Integrating Deterministic Telemetry into the Bidstream
The architectural challenge of this dataset is not merely generating it, but routing it efficiently. During the observation period of January 1 – May 11, 2026, the 731 sensing screens generated immense volumes of data. However, raw optical telemetry is useless if it cannot be parsed by the demand ecosystem.
This is where the translation into standardized protocol becomes critical. Across the inventory pool, this deterministic data is attached to an average of 104,000 auctions per day with audience signals attached. For DSP integration leads and measurement engineers, ingesting this data requires a deep understanding of the [OpenRTB](https://github.com/InteractiveAdvertisingBureau/openrtb) specification.
Because the IAB `segtax=4` taxonomy is insufficient for physical contexts, the `segtax=600` schema is passed within the `User` or `Device` object extensions (`ext`). When a bid request is generated, the edge computing module aggregates the localized audience data present in the physical space at that exact second. If a group of three individuals is standing within the 25-degree gaze cone, the bid request will populate an array of audience objects, each containing the 6 declared fields per profile.
For engineers building in-network DSP integrations, this multi-object array represents a fundamental shift in bidding logic. Traditional digital bidding evaluates a single user profile. Deterministic DOOH bidding requires evaluating an audience composition matrix. If the DSP is tasked with delivering a campaign targeting business commuters, the bidding algorithm must parse the OpenRTB request, iterate through the array of detected profiles, and calculate the density of the "Business/Formal" attire archetype and the "Transit/Commute" intent stage.
Furthermore, the multi-SSP routing infrastructure must handle this dense payload without exceeding strict latency timeouts. Processing 30 FPS computer vision at the edge ensures that the massive computational load of frame analysis is handled locally at the screen enclosure. By the time the data hits the demand waterfall, it has been compressed from gigabytes of raw video into a lightweight, millisecond-optimized JSON payload detailing the attention buckets and the `segtax=600` classifications.
This separation of concerns—heavy optical processing at the edge, lightweight semantic routing in the cloud—is what allows the network to maintain high availability while processing 104,000 auctions per day with audience data. For a deeper look at how this infrastructure scales globally, review our [network snapshot](/research/state-of-dooh-2026/network-snapshot/).
## The Future of Measurement Auditing
The introduction of these 731 sensing-enabled screens acts as a forcing function for the entire measurement industry. As long as the physical world relied on probabilistic panels, measurement auditors were forced to accept wide margins of error. The transition to deterministic, frame-by-frame telemetry removes that margin.
When a measurement vendor receives a log-level data feed from this architecture, they are no longer looking at a statistical guess. They are looking at a cryptographic-level record of physical events. They can see that at 14:32:05, three individuals were present. They can see that two of them maintained a gaze vector within the 25-degree cone for 4 seconds, registering in the medium attention bucket. They can see the precise `segtax=600` classification of those individuals.
This level of granularity demands a complete rewrite of how campaigns are attributed and valued. For buyers accustomed to the traditional digital ecosystem, this represents the long-awaited parity between online and offline measurement, a concept explored further in our primer on [programmatic DOOH 101](/guides/programmatic-dooh-101/). It proves that physical ad space can be measured with the same, if not greater, deterministic rigor as a desktop web browser.
As this technology scales beyond the initial 731 screens, the expectation from the demand side will invariably shift. The demand ecosystem will no longer accept panel-based multipliers when deterministic attention scoring is available. For a comprehensive analysis of how this expectation shift is altering bid density and clearing prices, consult our report on the [demand ecosystem](/research/state-of-dooh-2026/demand-ecosystem/). The standard has been permanently elevated, and the physics of measurement will never regress to the mean.
## FAQ
### How does the 25-degree gaze cone tolerance impact total impression volume?
The strict implementation of a 25-degree gaze cone tolerance generally reduces the absolute volume of gross impressions compared to legacy panel models, but vastly increases the quality and verifiable accuracy of the remaining impressions. By filtering out individuals who are physically present but looking away (e.g., staring at their phones or facing the opposite direction), the telemetry ensures that every recorded impression represents actual optical intersection with the screen. This deflation of "ghost impressions" allows buyers to bid on true attention rather than mere proximity.
### Why use a custom segtax=600 instead of standard IAB taxonomies?
The standard IAB Audience Taxonomy (`segtax=4`), which contains 1558 nodes, was architected for digital environments—specifically browser cookies and mobile device IDs based on historical browsing data. It lacks the vocabulary to describe real-time physical realities. The custom `segtax=600` schema was developed specifically for computer-vision-driven DOOH to capture immediate, observable context—such as group composition, attire archetype, and physical intent stage—which are critical deterministic variables for physical-world measurement that do not exist in online taxonomies.
### How is PII handled when processing 30 FPS frame data?
Privacy is maintained through strict edge-computing architectures. While the optical sensors capture data at 30 FPS to calculate gaze vectors and attention buckets, all computer-vision inference is performed locally on the hardware enclosure. The system extracts anonymous mathematical vectors (such as head pose coordinates and bounding box dimensions) and immediately discards the raw video frames in volatile memory. No photographic images or Personally Identifiable Information (PII) are ever stored, transmitted, or routed into the bidstream.
Frequently Asked Questions
What is segtax=600 in DOOH measurement?
segtax=600 is the Trillboards namespace for audience signals that have no IAB Audience Taxonomy equivalent. It covers six declared classes including group composition, intent stage, attire archetype, activity macro, ethnicity (coarse aggregate), and engagement narrative.
How many sensing screens does the network have?
731 sensing-enabled screens as of May 11, 2026, observed over the January–May 2026 measurement window. Sensing-enabled means the screen carries an on-device CV pipeline that emits face counts, attention level, dwell, and gaze seconds per audience moment.
How is this different from IAB Audience Taxonomy?
IAB Audience Taxonomy 1.1 (segtax=4) describes ~1,558 audience nodes built primarily for digital and CTV. segtax=600 adds CV-unique DOOH signals that the IAB tree does not encode — for example "audience_intent_stage = consideration" or "audience_activity_macro = transit" — which only an on-screen sensor can observe.
Ready to Turn Your Screens Into Revenue?
Join thousands of businesses earning $200-500/month per screen with Trillboards' FREE digital signage platform.
Get Started Free