Translating Images into Maps

Unlike previous approaches, we treat the transformation to BEV as an image-to-world translation problem, where the objective is to learn an alignment between vertical scan lines in the image and polar rays in BEV.
Transformers are well-suited to the image-to- BEV transformation problem, as they can reason about interdependence between objects, depths and the lighting of the scene to achieve a globally consistent representation.

Input: image, intrinsic matrix.

Output: semantic BEV maps for static and dynamic classes


Treat 1-1 correspondence between each vertical scanline and its associated ray as a seq2seq translations.

??? 竟然有彩蛋 ???

Inter-plane attention

I'm not sure the paper is written clearly and whether it is the final version.

发布于 2021-10-17 10:54