CNN-based object detection
The camera module of the AiCP system detects the objects in the scene using a CNN-based object detection algorithm, You Only Look Once (YOLO) [58]. It predicts the class of an object and outputs a rectangular bounding box specifying the object location at detection time \(\delta t_{D}\). \(B(I_{yolo})\) contains the bounding-boxes \({bb^1;bb^2;..;bb^N}\) of the top-scored candidate class of all N-objects detected in the acquired image \(I_{yolo}\). Each \(bb^{n}\) have four parameters: centroid coordinates \(b_{xc}^{n},b_{yc}^{n}\) of the detected top-scored candidate class along with the width \((b_{w}^{n})\) and height \((b_{h}^{n})\), expressed as,
$$\begin{aligned} B(I_{yolo}(x,y,t))=({bb^1;bb^2;..;bb^N}){, } \end{aligned}$$
(1a)
$$\begin{aligned} bb^{n}=\{b_{xc}^{n},b_{yc}^{n},b_{w}^{n},b_{h}^{n}\}{.} \end{aligned}$$
(1b)
Informative masking
The FPP and IPP images are cumulatively generated based on the detected objects and their bounding box \(B(I_{yolo}(x,y,t))\); they are then passed through the DLP projector and later projected as series of modulated colored planes on the respective objects. This is expressed as,
$$\begin{aligned} P_{fpp}(x,y,t)&= \int _{0}^{\frac{T}{2}} \lambda _{t}\alpha _{t}(u,v,t)dt{ , } \end{aligned}$$
(2a)
$$\begin{aligned} P_{ipp}(x,y,t)&=\int _{\frac{T}{2}}^{T} \lambda _{t}\alpha ^{'}_{t}(u,v,t)dt{ . } \end{aligned}$$
(2b)
\(P_{fpp}(x,y,t)\) and \(P_{ipp}(x,y,t)\) are the forward and inverse projection light planes that encode the information of the pointed bounding box of objects \(B(I_{yolo}(x,y,t)).\) The values of \(P_{fpp}(x,y,t)\) and \(P_{ipp}(x,y,t)\) are determined by combination of color wheel filter, temporal dithering and \(F_{cf}\) of DLP projector in place. The \(\alpha _{t}(u,v,t)\) is DMD mirror angle in DMD mirror position (u, v) and \((\lambda _{t})\) is the color filter wavelength emitted at time dt. The mirror angle \(\alpha _{t}\) of all DMD mirrors for \(P_{ipp}\) is always complementary to \(P_{fpp}\) for each information packet. The angle of the DMD mirror at position (u, v) is determined by image plane decided by the OPC as listed in Table 4.
Thus, the spatiotemporal packet of information is generated based on the following condition,
$$\begin{aligned} D(x,y,T)= \left\{ {\begin{array}{ll} P_{fpp}(x,y,t) + P_{ipp}(x,y,t),&{} \text {if} \,\, T \le \dfrac{1}{F_{cf}} \\ 0, &{} \text {otherwise} { . } \end{array}}\right. \end{aligned}$$
(3)
In this way, the AiCP system points the informative color mask on each detected object at high-speed while imperceptible to the HVS.
High-speed vision based recognition and localization
The HFR camera receiver’s frame rate is set at 480 fps (same as \(P_{f}\) of HSP) for frame-by-frame decoding. The HFR camera acquires each plane of the projected packets in sequence and computes the changes in intensities of projection within the packet leading to decoding. Hence, HFR camera focuses on high-speed projector photometric properties rather than geometric calibration [13]. The nature of projected light based on the projection device principle and light reflectance from the projected surface is estimated and decoded using HFR camera.
Image acquisition
Pixel position of the projected header data is manually assigned to assist HFR camera in understanding the color filter cycle and projection sequence cycle. Images are acquired simultaneously corresponding to the projected image. Initially, the HFR camera looks for blue filter data with the first projection sequence emitted by the projector. The HFR camera is interfaced with a function generator to acquire images uniformly.
$$\begin{aligned} I_k(x,y,\delta _{t})=D(x,y,\delta _{t})s(x,y,\lambda _{\delta _{t}}) {, } \end{aligned}$$
(4)
where k and \(\delta _t\) are the frame number and time interval of monochrome HFR camera at 480 fps. The x- and y- coordinate systems corresponds to the HFR pixel position. Note that for acquiring all the color filter segment data, \(\delta _t\) is \(\frac{1}{480}=2.083\,ms\). As mentioned previously \(D(m,n,\delta _t)\) is the projected information observed at duration \(\delta _t\). \(s(x,y,\lambda _{\delta _{t}})\) is spectral reflectance from the surface of the object in the projected area at time \(\delta _{t}\).
Sequential thresholding
The acquired frames are sequentially thresholded by binarizing the projected and non-projected areas in the scene. The binarization of image \(I_k(x,y,\delta _{t}))\) at time \(\delta _{t}\) with the threshold \(\theta \) is represented as,
$$\begin{aligned} B_k(x, y, \delta _{t})= \left\{ {\begin{array}{ll} 1,&{} \text {if} \quad I_k(x, y, \delta _{t})\ge \theta \\ 0, &{} \text {otherwise} { . } \end{array}}\right. \end{aligned}$$
(5)
Sequential weighting and accumulation
White pixels in the thresholded image plane are weighted based on the status of the projection and color filter sequence and accumulated a packet of information. The decoding plane \(I_{dec}(x,y,T)\) at overall decoding time T is represented as,
$$\begin{aligned} I_{dec}(x, y, T)=\sum _{P_{s}=1}^{N_{p}} \sum _{C_{s}=0}^{C_{f}-1}{2}^{(P_{s}C_{s})} \int _{0}^{T}B_k(x, y, \delta _{t})dt { , } \end{aligned}$$
(6)
where \(N_{p}\) is the total number of projection sequences for an accumulation time T, \(C_{s}\) is the color sequence, and \(I_{dec}\) represents the labeled decoded information of each non-zero pixel.
Pixels of the same values are segregated based on the recognition identity; hence, the HFR vision system can decode spatiotemporally transmitted information mapped on the objects pointed by the AiCP system.
Localization of pointed objects
To determine the trajectory of each labeled object in the projection area, we calculate the zeroth and first-order moments of the \(I_{dec}\) as,
$$\begin{aligned} M_{pq}(I_{dec}(T)) = \sum _{(x,y)\epsilon I_{dec}(T)}(x^{p}y^q I_{dec}(x, y, T)) { . } \end{aligned}$$
(7)
The zeroth and first-order moments were used to calculate the decoded area (\(O_{area}\)) and centroid (\(O_{xy}\)) of the decoding plane \((I_{dec}(T))\) that corresponds to each object after accumulated time T,
$$\begin{aligned} O_{area}(I_{dec}(T))&= M_{00}(I_{dec}(T)) { , } \end{aligned}$$
(8a)
$$\begin{aligned} O_{xy}(I_{dec}(T))&= \Big ( \frac{M_{10}(I_{dec}(T))}{M_{00}(I_{dec}(T))},\frac{M_{01} (I_{dec}(T))}{M_{00}(I_{dec}(T))}\Big ){, } \end{aligned}$$
(8b)
where \(M_{00},\)\(M_{01}\) and \(M_{10}\) are the summations of decoded pixels, x-position and y-position, respectively of the decoded regions in \(I_{dec}(T).\)The decoded regions are labeled on the basis of OPC. In this way, the HFR vision system decodes visible light information and localizes the objects as \(O_{xy}(I_{dec}(T));\) it determines their trajectories pointed by bounding boxes \(B(I_{yolo})\) of each detected object in the AiCP encoded system.