“Since its establishment in 2014, BIGO has been focusing on providing audio and video services on a global scale. In just five years, BIGO has steadily entered the top ten in the global App revenue rankings. Its global video live broadcast community Bigo live, short video content creation platform Likee, and audio and video communication imo serve more than 400 million users in 150 countries around the world. Audio and video product matrix.
Since its establishment in 2014, BIGO has been focusing on providing audio and video services on a global scale. In just five years, BIGO has steadily entered the top ten in the global App revenue rankings. Its global video live broadcast community Bigo live, short video content creation platform Likee, and audio and video communication imo serve more than 400 million users in 150 countries around the world. Audio and video product matrix.
BIGO’s rapid progress in the global audio and video business is inseparable from its deep accumulation in the audio and video technology field for many years. Looking back on the past ten years, YY has always been the pioneer and leader of audio and video technology in the domestic Internet industry. The great success of YY Voice in 2008, the creation of live audio and video in 2010, the victory of YY Live in the thousand-broadcast battle of mobile audio and video live broadcast in 2017, and the incubating of the game live broadcast “Huya” in 2018 was listed on the New York Stock Exchange, and by 2020 overseas BigoLive continues to enter the top ten in the global App revenue rankings, and Likee has become the second largest short video platform in the world, both of which are inseparable from the strong support of audio and video technology. BIGO’s audio and video technology has stood on the basis of YY’s industry-leading audio and video technology since its inception. In view of the complex and diverse terminal distribution and network environment in the world, after years of hard work in three super-large core businesses in real scenarios, it has further accumulated. A total solution of BIGO audio and video technology optimized for the global and local environment has been developed.
The core capabilities of BIGO’s global audio and video technology overall solutions include:
a>Large service scale: The audio and video duration of a single month service exceeds 100,000,000,000 minutes, which is one of the best in the world
b>Support massive users online at the same time: Provide real-time video interactive services with tens of millions of people online at the same time
c> High-quality service capabilities: Provide industry-leading QoE/QoS high-quality audio and video services in a complex global network environment
d> High cost performance: under the same service conditions, the average cost is only 50% of the general level in the industry
Next, we will analyze BIGO’s ten-year technical accumulation of audio and video from three perspectives: “Audio and Video Codec”, “Introduction to Audio and Video Transmission” and “Audio and Video Infrastructure”.
BIGO audio and video codec technology
First of all, the audio and video codec technology is very important to obtain a clear and smooth interactive experience. The codec algorithm removes the temporal, spatial, and perceptual domain redundancy of audio and video content, so that the video can achieve higher definition in the case of low bandwidth, but the codec algorithm also faces some challenges when it is practical. : 1. How to use limited computing resources to process the encoding of tens of millions of videos in a timely manner; 2. Video content varies widely, how to adapt to different content to maximize the effectiveness of the encoding algorithm.
Through self-developed encoders, adaptive coding, adaptive noise reduction and other technologies, Bigo has effectively responded to the above challenges and improved the basic audio and video experience of related products. Here, we will focus on the self-developed encoder and adaptive coding technology.
1. Self-developed encoder technology
In the global complex network environment, users’ demands for video quality improvement are increasing day by day. How to provide higher image quality at a lower bit rate and faster speed poses a challenge to encoder technology. The x265 encoder is a The industry-recognized excellent HEVC open source encoder, compared with the previous generation encoder x264, can save 40+% of the bit rate under the same encoding speed and image quality. Therefore, it is integrated by a large number of audio and video industry companies to greatly improve the video foundation. experience. As a basic video technology, BIGO technology believes that the encoding performance of x265 does not give full play to the compression of the HEVC standard. Therefore, we independently developed the Bigo265 encoder. The following is the encoding performance of Bigo265 on different types of test sequences. Compared with the veryslow file of x265, in the case of 5 times acceleration, the average code rate can be saved by 15%, reaching the industry-leading level.
We compared the compression performance of Bigo265 with other standard encoders on the Bigo Likee test set. As shown in the table below, on average, Bigo265 can save 37% of the code rate while speeding up by 1.6 times.
In addition, we evaluated Bigo265 under the test conditions of the MSU encoder competition. As shown in the figure below, it can be seen that Bigo265 has reached the top level of the MSU encoder competition in recent years.
There are three types of test sets here, including different resolutions and content complexity, covering a variety of video scenarios; Likee is the relevant video on the business side, JCTVC is the official test set of HM, and MSU is a complex hybrid provided by Moscow University. In the test set, it can be seen from the above that Bigo265 has great advantages in encoding effects on various test videos. The following figure shows the time and space complexity of the test sequence:
1) Bigo265 technology introduction
Based on the H265 kernel, Bigo265 basically supports all the encoding tools of the HEVC standard. In addition, it adds dozens of efficient and fast algorithms, which greatly improves the encoding speed with little loss of image quality. In terms of rate control, it supports a variety of rate control methods including ABR, CBR, VFR, multiple PASS, CRF, etc. The preprocessing part has also done a lot of optimization, which can improve the coding efficiency according to the video content. The following briefly introduces the cost-effective fast algorithm technology and the adaptive rate control technology.
Cost-effective fast algorithm technology, in order to meet the different requirements of different businesses for speed and quality, Bigo265 provides eight encoding gears, each gear fast algorithm cost-effectiveness is consistent, as shown in the speed quality curve in the figure below, compared with x265, The Bigo265 speed quality curve is approximately a straight line, and the slope is much lower than that of x265. Therefore, Bigo265 can provide smoother speed and quality when facing different business requirements. At the same speed, if the encoding speed is faster, Bigo265 will The advantage is bigger than x265.
Adaptive bit rate control technology, Bigo265 can adaptively perform bit rate control according to video scene characteristics, content complexity, frame type, and adjust QP for block-level weights through AQ/CUTREE technology. In addition, it also provides ROI interface , the QP can be adaptively adjusted according to the user’s ROI area to achieve the purpose of stable bit rate.
2) Combining with the business side
Bigo265 has been deployed in the services of Likee and BigoLive, saving 20+% of the bandwidth while obtaining a better user viewing experience. At present, the encoder can be configured with multiple speed and image quality levels, and has been optimized for different application scenarios. Business departments can easily adjust and adapt to meet their own needs. In addition to focusing on the improvement of Bigo265 encoding efficiency and speed, Bigo’s codec team also began to develop new generation standard encoders such as AV1, VVC, and AVS3.
2. Adaptive Coding Technology (BigoCAE)
Traditional transcoding services use fixed coding parameters for transcoding, and cannot adaptively select optimal coding parameters according to the complexity of video content, resulting in waste of bit rate for simple videos and insufficient quality for complex videos. BigoCAE is committed to automatically identifying the complexity of video content to select a reasonable encoding strategy to achieve the best balance between quality and bit rate, save bit rate globally, and balance image quality.
BigoCAE in existing business, target vmaf points [-2,+2] The coding prediction accuracy within the range can reach 93%+. On 3000 test sets covering multiple resolutions and frame rates, the quality variance is significantly improved, the low-quality cases are reduced, and the average bit rate is saved by 40%+.
The BigoCAE content adaptive transcoding strategy is based on our self-developed Bigo265 encoder, which integrates content analysis (transfer learning, encoding feature analysis, etc.), AI encoding parameter prediction, and fine-grained rate control (frame-level code control, ROI code control) and other technologies to achieve stable quality and save bit rate.
The content analysis uses coding features and transfer learning features. The encoding feature uses the original code stream and pass1 fast encoding information, as shown in the following figure:
The transfer learning adopts the classic image classification network, uses the image classification network that has been trained for cv application, and extracts the fc layer before its classification as the input feature of AI coding prediction.
In order to speed up the prediction speed and meet the real-time needs of the business, AI coding prediction adopts a simple shallow neural network, as shown in the following figure:
At the code control level, the content of the ROI area is adaptively adjusted. For example, the face area is the focus of interest. We have made coding enhancements in this area. Under the condition that the overall code rate remains unchanged, the ROI area is better than other areas. more clear.
BigoCAE continues to evolve with the evolution of business. We will continue to introduce new features, new networks, and new code control algorithms to improve the content adaptive effect of BigoCAE algorithms.
BIGO audio and video transmission technology
Building industrial-grade “high availability”, “high versatility”, and “high quality assurance” audio and video transmission technology is very important for audio and video products, and different business scenarios have great differences in the focus of transmission technology optimization:
In addition, the network characteristics of different countries and regions are very different; the routing and link quality and charging methods across countries and continents are also very different. Different network types have their own behavior patterns and pipeline characteristics, and need to adapt to different transmission control strategies.
Finally, the types and methods of network access of users in different regions are very different, and users’ preferences for network traffic fees are also different.
Therefore, in the process of formulating transmission strategies, it is necessary to comprehensively consider and optimize the design of multi-dimensional situations such as the focus of business scenarios, network characteristics of different countries and regions, and users’ preferences for quality of experience and network payment.
Facing the above key challenges of audio and video transmission, BIGO audio and video transmission technology has been continuously evolving from the beginning of design to the actual implementation, and has built a complete basic system of transmission technology, including the following four key technical directions:
These four key technologies occupy an important position in the entire audio and video solution, and we will introduce them one by one.
1. Network transmission congestion control technology
If you imagine the Internet as a highway system, each Internet path is like a highway. When too much data enters the network, it is like congestion in the highway system due to insufficient transportation capacity of certain nodes. This data congestion is commonly referred to as link congestion.
Congestion control research has gone through more than 30 years, and many congestion control algorithms have emerged. Some representative algorithms are shown in the figure.
1.1 BTP Congestion Control System
In the BigoLive live broadcast system, in view of the characteristics of live broadcast, which is sensitive to the lack of clarity, and relatively insensitive to delay, BIGO technology has accumulated a complete set of congestion control solutions – the BTP congestion control system, which has achieved an average zero freeze rate online. More than 94%, 720p accounts for more than 30%, and the average delay is less than 2s, which is at the top level in the industry.
BTP congestion control system is a sub-scenario control system, and its main algorithm is TFRC. TFRC is sent based on the rate, the process is stable, and it is more suitable for streaming media transmission, but it has the disadvantages of low throughput under random packet loss and high latency in small bandwidth scenarios.
Low throughput rate under random packet loss: The congestion algorithm takes packet loss as a congestion signal. When encountering a wireless network Wifi/2G/3G/4G, some random packet loss will occur due to the possible channel fading and signal interference characteristics of the wireless channel itself. , this packet loss will cause this type of algorithm to misjudge congestion, resulting in lower throughput. In order to solve the problem of random packet loss scenarios, we prep +800kbps speed limit), accurately filter out random packet loss, and retain congestion packet loss as a congestion control signal.
High latency in low-bandwidth scenarios: For a small-bandwidth network below 600kbps, its typical feature is that the routing buffer queue is long. When packet loss congestion is detected, the waiting time of data accumulated in the buffer can be as high as 10s, which seriously affects the live broadcast experience. To this end, we introduce the auxiliary algorithm slops, which is a delay-based congestion control algorithm, which can accurately infer the delay type and network state to implement the corresponding congestion control output.
Under the combined effect of the above-mentioned multiple algorithms, the BTP congestion control system is verified in the laboratory simulation environment: it has more than 40% resistance to random packet loss, the bandwidth is still available as low as 300kbps, and the network jitter is still working normally at 1200ms.
Build a system simulation platform: Relying on the Pantheon + mahimahi platform, we have expanded and enriched the input types of network traces, improved CC algorithm performance analysis tools, and formed a complete system simulation and analysis platform.
Online closed verification: We take a country as an entry point, and carry out data analysis of user network traces in that country, as well as simulation comparisons with different congestion control algorithms. The results are shown in Figure 2 below. Compared with the CC algorithm, it ranks high.
At the same time, after conducting ab experiments online, the technical indicators of throughput rate +0.74% and freezing rate -0.38% are obtained, which is consistent with the offline evaluation. Based on the big data-driven algorithm optimization system, we have launched several optimization items and achieved significant benefits.
2. Weak network confrontation technology
The user access network has a complex form (especially the wireless channel itself has obvious channel fading and signal interference characteristics), and the services carried are diverse. During the transmission process, the network conditions will experience a large change and deterioration. Characterized from the network transmission capability index, that is, the available bandwidth is low, the end-to-end delay is large, and the packet loss rate is high, which greatly affects the transmission performance of users. The existing technology may not be able to guarantee the lowest quality QoS requirements.
With the help of the network trace collection system, we analyzed the characteristics of BIGO’s global user network in different dimensions. Taking the user network in a certain region as an example, from the perspective of bandwidth indicators, the average bandwidth is less than 500kbps, accounting for about 1%; from the perspective of packet loss rate indicators, the overall average packet loss rate is 7.2%, and the connection packet loss rate is higher than 20%. About 10% of the time, the random packet loss type accounted for about 66%; from the perspective of the delay index, the average RTT of more than 380ms of connections accounted for about 30%. Therefore, the weak network countermeasure technology can be mainly divided into anti-packet loss technology and anti-jitter technology from the method.
2.1 Anti-packet loss technology
Two well-known anti-loss techniques are Active Repeat Request (ARQ) and Forward Error Correction (FEC). ARQ and FEC have their own advantages and disadvantages: ARQ can maximize bandwidth utilization, but needs to introduce additional delay, while FEC avoids adding additional delay by increasing information redundancy (sacrificing bandwidth utilization).
Aiming at the characteristics of ARQ and FEC two anti-packet loss methods, we adopted the strategy of maximizing strengths and circumventing weaknesses, and exerting their respective advantages as much as possible, which is HARQ (Hybric ARQ). The overall idea is that in a network with a small RTT, HARQ mainly uses ARQ to reduce redundant traffic; in a scenario with a large RTT, FEC is mainly used to reduce the recovery delay.
By comparing the respective processing effects of HARQ and ARQ+FEC in different scenarios, we can see that the HARQ recovery rate in the rate-limiting scenario is significantly improved; in addition, the introduction of traffic indicators is significantly reduced, and the unified decision-making of HARQ effectively reduces the traffic cost.
We test the actual anti-packet loss effect of HARQ technology through the evaluation method of audio transmission quality MOS. According to the comparison results, the anti-packet loss module significantly improves the MOS. The MOS score can be maintained above 4 when the packet loss rate is below 40%, and the sound quality is relatively stable, which effectively improves the user experience.
Frontier exploration of FEC technology: In audio and video transmission, Reed-Solomon (RS) is a common encoding method to combine several frames of video into one FEC encoding block for encoding – a longer FEC encoding block can be encoded under the same redundancy Tolerate more packet loss, but at the cost of video playback delay. BIGO technology proposes a new solution (RE-RS), which uses a sliding window to scan several consecutive frames of the video, and generates a set of redundant packets for each frame expansion or movement of the window. Under the common network packet loss distribution, RE-RS can be timely. Recover data efficiently.
Figure: Schematic diagram of sliding window coding
We test the effect of the RE-RS coding method by controlling the packet loss rate experiment of random packet loss. As shown in the table below, RE-RS can obtain a higher recovery rate than RS in different packet loss rate configuration experiments.
2.2 Anti-jitter technology
In order to adapt to changing network and user scenario requirements, we designed BigoJitter, whose main body includes modules such as voice packet buffer, network jitter estimator, playback delay estimator, playback decision maker, decoder, shifter, decoded data buffer, etc. , the core algorithm lies in the estimation of network jitter, the estimation of playback delay and the formulation of playback strategy. BigoJitter uses historical jitter range and autoregressive algorithm to estimate playback delay, so that it can quickly adapt to network jitter changes.
We again use the audio transmission quality MOS evaluation method. As shown in the figure below, BigoJitter’s resistance under various weak network conditions is excellent.
3. Adaptive bit rate playback control technology
In order to cope with the huge differences in bandwidth capabilities of users in different regions of the world, Bigo has developed the functions of real-time on-demand transcoding and adaptive bit rate control. In the example shown in the figure, according to the summary of the viewing bit rate of the audience, transcoding and distribution are performed on demand in the cloud to achieve the goal of saving transcoding computing resources and network transmission resources.
3.1 Live/VOD Adaptive Bit Rate Control
In live and video-on-demand scenarios, we have developed and implemented an adaptive bitrate algorithm based on MPC model prediction. It analyzes user characteristics and preferences, predicts download bandwidth information and cache length changes, and solves the problem of resolution/bitrate selection. Modeled into a dynamic optimization problem, the optimization goal is the user’s viewing experience index QoE (Quality of Experience).
It can be seen from the frame diagram that how to accurately and effectively predict QoE is the most critical factor affecting the effect of the entire adaptive bit rate algorithm. Therefore, after continuous efforts, BIGO technology has developed and implemented a QoE prediction model based on User Engagement.
3.2 QoE prediction model based on User Engagement
We propose a viewing experience index QoE that combines playback technical indicators with user participation, thereby further narrowing the gap between the QoE model and the actual subjective experience of users.
Feature selection: Unlike traditional QoE formulas, in addition to transmission technical indicators, we also use original features including user geographic location information, mobile phone software and hardware attributes, user-video interaction, etc. new features of , and new features generated by feature intersection. Then, according to the correlation and feature importance, new feature subsets are generated by round-by-round screening. The figure below shows the Pearson Correlation Coefficient (PCC, left) of several key features and target user engagement, as well as feature importance (measured by gain, right) in a boosted tree model.
Parameter tuning: On the selected feature set, we use Bayesian optimization, K-fold cross-validation to search for the optimal hyperparameters of the boosted tree model. Bayesian optimization assumes that the optimized function is a black-box function and comes from a Gaussian process, and each round determines the hyperparameter xnext = arg maxx a(x) for the next set of attempts by optimizing the acquisition function a(x), where a(x) is improve expectations,
a(x) = s(x) (b(x) F(b(x)) + N(b(x)))
where N(x), F(x) are the density and cumulative distribution functions of the standard Gaussian distribution, m(x), s(x) are the estimates of the mean and variance of the Gaussian process based on the existing observations, b(x) = ( f(xbest) – m(x)) / s(x) is the current optimal parameter.
Hyperparameter optimization generally selects more aggressive parameters to generate complex tree models, which brings additional deployment costs. Therefore, we comprehensively consider the fitting accuracy, model size, and calling time of the model, generate a series of models of the Pareto frontier, and deploy them according to actual online needs. The figure below shows the first 3 boosted tree structures of a Pareto efficient compression model.
Model application and benefits: We use the QoE fitting model to optimize the Bigo Likee short video definition file selection algorithm, which improves user viewing satisfaction and saves the bandwidth consumption of Bigo servers.
4. Access routing strategy optimization technology
Bigo users cover hundreds of countries and regions around the world. The network conditions and network quality of users in different countries and regions are very different. Relying on the support of Bigo’s dozens of data centers around the world and Bigo’s powerful AI technology and big data analysis capabilities , we have implemented a complete “real-time intelligent routing scheduling”:
a) Extract multi-dimensional transmission quality indicators from massive historical transmission data, map the multi-dimensional transmission quality indicators into quality scores in combination with the different requirements for QoS and QoE of different forms of products, and finally generate a benchmark routing table that is refined to operators;
b) Continuously update and dynamically adjust the reference routing table by counting changes in network quality at different time granularities;
c) Real-time monitoring of the transmission path quality, timely and dynamic switching of intermediate forwarding nodes for sudden network congestion and network failures, reducing the impact of network problems on transmission quality.
Through “real-time intelligent routing and scheduling”, we provide a stable and high-quality path for media data transmission, especially cross-border and intercontinental data transmission.
BIGO global network infrastructure construction
Judging from the technical experience accumulated by BIGO technology for many years of large-scale business accumulation, high-quality audio and video technology service capabilities are inseparable from the in-depth customization of infrastructure. BIGO chose to build its own global network infrastructure to provide end-to-end business turnkey Technical solution capability.
From the perspective of audio and video business scenarios, the challenges of building a high-quality global RTN network can be divided into two parts: (1) how to ensure the access quality of a large number of users to each computer room; Communication quality between DCs. The following is an introduction to the construction of BIGO in these two blocks.
1. User access network quality optimization
Whether it is the host or the audience, user access is the most important link that affects the quality of service, and it also faces the most complex and diverse network environment. BIGO selects cities with the most abundant operator resources on all continents and key countries to build BIGO’s Internet access nodes, which are managed through the BIGO Internet eXchange (BIX for short). Mainly from the following three aspects to optimize.
BIX = Bigo Internet eXchange, IPT = IP Transit Provider, IX = Internet eXchange
Peer = BGP Private Peer
(a) Close to users and connect the world. In the city closest to the user, self-built high-quality BGP egress network, peering with a large number of local ISPs, and the connection methods include IPT, IX, Peer, etc. At present, the number of peers established with global operators has reached 170+, and in-depth technical cooperation with more than 2W ISPs.
(b) DC to user, intelligent routing. Real-time analysis of the network quality from the computer room to users, including key indicators such as packet loss, jitter, RTT, and connection success rate, and real-time drawing of available egress paths based on Internet routing changes. Ensure optimal routing to the user network with intelligent scheduling controllers.
(c) User to DC, dynamic preference. Real-time analysis of the quality of user access to different equipment rooms, combined with equipment room load, user quality and other indicators, dynamically adjust the equipment room allocation strategy to ensure that users of each operator in each region access the computer room with the best comprehensive score and improve user experience.
2. The communication quality between DCs is optimized, providing 99.99% high-quality transmission rate
(a) Inter-city interconnection, high-speed and reliable. In Europe, America, Southeast Asia, India and other Europe, multiple MANs have been built, and physical fiber resources are deeply utilized through wavelength division multiplexing technology to provide ultra-large bandwidth and highly reliable inter-DC communication network for the interconnection of data centers in the same city.
(b) Global physical dedicated line intercommunication, intelligent scheduling. Between global data centers, and data centers in various regions and continents, a global backbone network is built by itself through submarine cables to connect major metropolitan area networks around the world; and SDN controllers are used to achieve congestion control, traffic scheduling, fault self-healing, Achieve stable and high-speed interconnection of global data centers. To ensure quality, submarine cables are generally physically redundant, such as using two different submarine cables at the same time.
(c) Virtual fiber, public network dedicated line. Even if the submarine cable is physically redundant, there is still a risk that upper-layer services will become unavailable in scenarios such as traffic bursts, submarine cable maintenance, explosive business growth, and rapid launch of new nodes. BIGO’s self-developed “public network private line” (BVTS) system decouples the network’s dependence on physical private lines, and adds anti-packet loss, TCP compression, encryption and other technologies. Realize the rapid opening of private lines, rapid service launch, and improve the reliability of the backbone network.
(d) In addition to the quality assurance of the underlying link, a layer of optimization is also added to the upper application layer: multi-dimensional analysis and intelligent routing. Self-developed IP-to-IP (point-to-point) intelligent routing system between DCs around the world, comprehensive and automatic decision-making based on real-time network status, network path load, available bandwidth, cost and other factors, and can choose the best transmission according to different business needs Paths, such as audio and video services, require low latency and can accept a small amount of packet loss; while signaling services are highly sensitive to packet loss, but can receive slightly higher latency.
After years of accumulation and development, BIGO has built nearly 100 IDCs in Asia, Europe, America and other parts of the world, with an export capacity of 40T, and has carried out in-depth technical cooperation with more than 2W ISPs, covering 150 countries and regions around the world. Provide 99.99% high-quality transmission rate between DCs around the world.
This article introduces the ten-year technical precipitation of BIGO’s audio and video technology solutions from the three main perspectives of audio and video encoding and decoding, audio and video transmission, and infrastructure construction. The technology is endless. BIGO technology continues to conduct technical research in audio and video technology to maintain its leadership in the industry, such as network quality intelligent positioning capabilities, refined network type segmentation and scene-based algorithm optimization capabilities, and understanding of user subjective experience. and evaluation, AI-based codec algorithms, HDR10 and 4K technology optimization, next-generation codec standards, and more.
Standing at the time point of 2020, BIGO has achieved from Europe to Asia, from the Americas to Africa, delivering the best audio-visual services to every corner of the world and to every person who loves life. In this process, the core audio and video technology capabilities and the growth of BIGO’s business have achieved mutual success over the years. Through trial and error in large-scale applications in the real market, they have finally been forged into a rock-solid BIGO audio and video technology solution.
The Links: 2MBI200L-060 PM50CLA120
Link to this article：Ten years of sharpening a sword – BIGO global audio and video technology solutions