In short - I've found that two methods of decoding H264 give quite different results. Both ends of the pipeline are Raspberry Pi 4.
V4L method (builds up latency up to 10 seconds)
libcamera@30 fps > TX program (uses V4L) > UDP over lossy network > RX program (uses V4L) -> framebuffer
LAVC method (remains responsive)
libcamera@30 fps > TX program (uses V4L) > UDP over lossy network > RX program (uses LAVC) -> framebuffer
I would happily forget pipeline 1 and use 2, but I have understood that V4L is the "official" way one is supposed to decode H264 (other people might use it) and of course, software decoding eats 100% of one CPU core on the receiver, even at a modest resolution of 640 x 480.
I have firmly excluded possible latency on the transmitter (when I restart the receiving side, it reverts to low latency and builds up to high latency again). I have also excluded bugs in the UDP receiver code (the same receiver with a LAVC decoder works fine). Can someone guess, why would V4L develop multi-second latency during prolonged operation?
Some definitions of V4L variables:A function that I can switch over from one pipeline to another, which makes all the difference. This function is fed with reassembled encoder frames. "Reassembled" means that while some encoded frames might never arrive to the decoder, those that do will arrive in the correct order, and have been checksummed against corruption. Visible deterioration of video does occur when I-frames are lost, but passes almost undetectably when P-frames are lost.
Guess: maybe I'm doing something wrong with the VIDIOC_QUERYBUF, VIDIOC_QBUF and VIDIOC_DQBUF calls?
More info: a fragment showing how I initialize V4L on the receiver:
I have also looked for an elegant way for "resetting" my decoder pipeline regularly or when latency is detected (I also send timestamps as metadata), but short of releasing all the V4L components and re-allocating and re-starting them, I have not found a remedy.
It's not a pressing issue for me, since LAVC works, but one of those puzzles that keeps haunting you because you chased bugs for a month and didn't catch them.![Smile :)]()
V4L method (builds up latency up to 10 seconds)
libcamera@30 fps > TX program (uses V4L) > UDP over lossy network > RX program (uses V4L) -> framebuffer
LAVC method (remains responsive)
libcamera@30 fps > TX program (uses V4L) > UDP over lossy network > RX program (uses LAVC) -> framebuffer
I would happily forget pipeline 1 and use 2, but I have understood that V4L is the "official" way one is supposed to decode H264 (other people might use it) and of course, software decoding eats 100% of one CPU core on the receiver, even at a modest resolution of 640 x 480.
I have firmly excluded possible latency on the transmitter (when I restart the receiving side, it reverts to low latency and builds up to high latency again). I have also excluded bugs in the UDP receiver code (the same receiver with a LAVC decoder works fine). Can someone guess, why would V4L develop multi-second latency during prolonged operation?
Some definitions of V4L variables:
Code:
// Video4Linux file descriptor.int v4l_fd;// Video4Linux capabilities structure.struct v4l2_capability v4l_capa;// Video4Linux format.struct v4l2_format v4l_format;// A buffer for interacting with Video4Linux, containing: start pointer, length field, inner buffer, data planestruct v4l_buffer { void* start; int length; struct v4l2_buffer inner; // index, type, bytesused, flags, field, timestamp, timecode, sequence, memory // [union] m (offset, userptr, planes, fd) // length, reserved, reserved2 struct v4l2_plane plane; // bytesused, length, [union] m (mem_offset, userptr, fd), data_offset, reserved};// Video4Linux request for bufferstruct v4l2_requestbuffers v4l_reqbuf;// Two instances of a V4L buffer, one for output to the decoder, other for capturing decoder output.struct v4l_buffer output_to_v4l;struct v4l_buffer capture_from_v4l;Code:
void decode_rx_data(uint8_t* p_data, size_t len) { // Works, no perceivable latency if (decode_method == DECODE_METHOD_LAVC) { // Prepare packet for decode avpkt.size = len; avpkt.data = p_data; // Send packet to decoder avcodec_send_packet(avcontext, &avpkt); // Get decoded frame int res = avcodec_receive_frame(avcontext, avframe); if (res < 0) { printf("Cannot decode frame.\n"); } else { // Decoded, convert from YUV420 to BGRA straight into framebuffer. sws_scale(sws_context, avframe->data, avframe->linesize, 0, STREAM_FRAME_H, sws_out_planes, sws_out_linesize); rx_frameno++; } // Release av_packet_unref(&avpkt); } // Starts responsive, but slows down to unbearable. if (decode_method == DECODE_METHOD_V4L) { // Query buffer, check if we can give input ioctl(v4l_fd, VIDIOC_QUERYBUF, &output_to_v4l.inner); if (output_to_v4l.inner.flags & V4L2_BUF_FLAG_DONE) { // Dequeue the V4L output buffer ioctl(v4l_fd, VIDIOC_DQBUF, &output_to_v4l.inner); // Copy received data to the buffer start pointer // Give information about how much data you provide memcpy(output_to_v4l.start, p_data, len); output_to_v4l.plane.bytesused = len; // Queue the buffer for decoding ioctl(v4l_fd, VIDIOC_QBUF, &output_to_v4l.inner); } else { printf("Cannot send encoded data to V4L.\n"); } // Query buffer, check if we can get output ioctl(v4l_fd, VIDIOC_QUERYBUF, &capture_from_v4l.inner); if (capture_from_v4l.inner.flags & V4L2_BUF_FLAG_DONE) { // Dequeue the V4L capture buffer ioctl(v4l_fd, VIDIOC_DQBUF, &capture_from_v4l.inner); // Get decoded length. size_t decoded_len = capture_from_v4l.inner.m.planes[0].bytesused; uint8_t* decode_p = (uint8_t*) capture_from_v4l.start; // Copy image to the framebuffer (don't overwrite the metadata box). memcpy(framebuf_p + (DATABOX_W * DATABOX_H * FB_BPP), decode_p, decoded_len); // Queue the capture buffer again. ioctl(v4l_fd, VIDIOC_QBUF, &capture_from_v4l.inner); rx_frameno++; } else { printf("Cannot get decoded data.\n"); } }}More info: a fragment showing how I initialize V4L on the receiver:
Code:
// Open a Video4Linux file descriptor. if ((v4l_fd = open("/dev/video10", O_RDWR)) < 0 ) { elog("Failed opening V4L file descriptor.\n"); exit(EXIT_FAILURE); } // Clear the format structure. memset(&v4l_format, 0, sizeof(v4l_format)); // DISABLED, made no difference. // Demand the H264 decoder to return a frame EVERY time. // ioctl(v4l_fd, V4L2_CID_MPEG_MFC51_VIDEO_DECODER_H264_DISPLAY_DELAY_ENABLE, true); // ioctl(v4l_fd, V4L2_CID_MPEG_MFC51_VIDEO_DECODER_H264_DISPLAY_DELAY, 1); // Output stream: get default format, set needed format v4l_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE; ioctl(v4l_fd, VIDIOC_G_FMT, &v4l_format); v4l_format.fmt.pix_mp.width = FRAME_W; v4l_format.fmt.pix_mp.height = STREAM_FRAME_H; v4l_format.fmt.pix_mp.pixelformat = V4L2_PIX_FMT_H264; v4l_format.fmt.pix_mp.field = V4L2_FIELD_NONE; ioctl(v4l_fd, VIDIOC_S_FMT, &v4l_format); // Capture stream: get default format, set needed format v4l_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; ioctl(v4l_fd, VIDIOC_G_FMT, &v4l_format); v4l_format.fmt.pix_mp.width = FRAME_W; v4l_format.fmt.pix_mp.height = STREAM_FRAME_H; // Caution: if you request something other than BGR32, you get YUV420. v4l_format.fmt.pix_mp.pixelformat = V4L2_PIX_FMT_BGR32; v4l_format.fmt.pix_mp.field = V4L2_FIELD_NONE; ioctl(v4l_fd, VIDIOC_S_FMT, &v4l_format); // Get the values again to examine them ioctl(v4l_fd, VIDIOC_G_FMT, &v4l_format); // Check if you got the requested pixel format. // if (v4l_format.fmt.pix_mp.pixelformat == V4L2_PIX_FMT_XBGR32) // Memory map V4L buffers. v4l_reqbuf.memory = V4L2_MEMORY_MMAP; v4l_reqbuf.count = 1; v4l_reqbuf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE; ioctl(v4l_fd, VIDIOC_REQBUFS, &v4l_reqbuf); map(v4l_fd, V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE, &output_to_v4l); // a wrapper around mmap() v4l_reqbuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; ioctl(v4l_fd, VIDIOC_REQBUFS, &v4l_reqbuf); map(v4l_fd, V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE, &capture_from_v4l); // a wrapper around mmap() // Queue the mapped buffers, granting the decoder exclusive access. ioctl(v4l_fd, VIDIOC_QBUF, &capture_from_v4l.inner); ioctl(v4l_fd, VIDIOC_QBUF, &output_to_v4l.inner); // Start the V4L conveyor with the VIDIOC_STREAMON call. // Decoder will now read from the "output to V4L" buffer // and put encoded frames into the "capture from V4L" buffer. int type_output = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE; ioctl(v4l_fd, VIDIOC_STREAMON, &type_output); int type_capture = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; ioctl(v4l_fd, VIDIOC_STREAMON, &type_capture);It's not a pressing issue for me, since LAVC works, but one of those puzzles that keeps haunting you because you chased bugs for a month and didn't catch them.
Statistics: Posted by diastrikos — Sun Sep 01, 2024 10:57 am — Replies 1 — Views 47