Open-Source Low-Complexity Perceptual Video Quality Measurement with pVMAF 

With the rise of digital video services, viewers expect high-quality visuals, making Quality of Experience (QoE) a priority for providers. However, poor video processing can degrade visual quality, leading to detail loss and visible artifacts. Thus, accurately measuring perceptual quality is essential for monitoring QoE in digital video services. While viewer opinions are the most reliable measure of video quality, subjective testing is impractical due to its time, cost, and logistical demands. As a result, objective video quality metrics are commonly used to assess perceived quality. These models evaluate a distorted video and predict how viewers might perceive its quality. Metrics that compare the distorted video to the original source, known as full-reference (FR) metrics, are regarded as the most accurate approach. Traditional quality metrics like Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), and Peak Signal-to-Noise Ratio (PSNR) are computationally lightweight and commonly used within encoders for Video Quality Measurement (VQM) and other encoder optimization tasks. However, methods that simply measure pixel-wise differences often lack alignment with human perception as they do not account for the complex intricacies of the Human Visual System (HVS).

In recent years, more advanced metrics have been developed to better reflect human perception by incorporating HVS characteristics. Among these, Video Multi-method Assessment Fusion (VMAF) has become a widely accepted industry standard for evaluating video quality due to its high correlation with subjective opinions. However, the high computational demand of VMAF and similar perception-based metrics limits their suitability for real-time VQM. Consequently, encoders primarily offer only PSNR and Structural Similarity Index Measure (SSIM) for full-frame quality monitoring during encoding. While not the most accurate, these metrics are the only options that can be efficiently deployed during live encoding, as more advanced VQM approaches would consume too much processing resources needed for real-time encoding.  To address these limitations, we introduced predictive VMAF (pVMAF), a novel video quality metric that achieves similar predictive accuracy to VMAF at a fraction of the computational cost, making it suitable for real-time applications. 

pVMAF relies on three categories of low-complexity features: (i) bitstream features, (ii) pixel features, and (iii) elementary metrics. Bitstream features include encoding parameters like the quantization parameter (QP), which provide insights into compression. Pixel features are computed on either the original or reconstructed frames to capture video attributes relevant to human perception, such as blurriness and motion. Finally, elementary metrics, such as PSNR, contribute additional distortion information. These features are extracted during encoding and fed into a regression model that predicts frame-by-frame VMAF scores. Our regression model, a shallow feed-forward neural network, is trained to replicate VMAF scores based on these input features. Initially designed for H.264/AVC, we extended pVMAF’s applicability to more recent compression standards such as HEVC and AV1. In this paper, we explain how we developed and retrained pVMAF for x264 and SVT-AV1. Experimental results indicate that pVMAF effectively replicates VMAF predictions with high accuracy while maintaining high computational efficiency, making it well-suited for real-time quality measurement.

Jan De Cock, Axel De Decker, Sangar Sivashanmugam | Synamedia | Kortrijk, Belgium

Topics

Share This Paper

$15.00