Welcome to SeeMoreDigital.net

This site is dedicated to Audio and Video encoding, together with the formats used.

Welcome Page Audio Info Video Info Useful Software How to... Cool Web Sites Goods For Sale Test Encodes Misc Stuff

 

Info about... Mpeg4 Settings and the Terminology Used

Simple Profile Settings

 

A typical Mpeg4 encoded video consists of Intra-frames, I-frames and Predicted-frames, P-frames. More recent Mpeg4 encoders also allow the user to encode using Bi Directional-frames, B-frames.

 

I-frames

I-frames are encoded by using information from within its own frame. They do not use any information from other frames (ie temporal compression). An I-frame is similar in concept to encoding a single frame with say, JPEG.

 

Technically speaking, an I-frame is one in which all of the macro-blocks are stored as images rather than as motion vectors. Encoding a collection of pixels in the form of a static block is the most expensive method in terms of storage and hence I-frames are the most expensive type of frame.

 

Fewer I-frames in the video generally translates to better compression and I-frames are normally used by the encoder only when too few blocks can be tracked from the reference frame by the motion search algorithm. I-frames serve a very important purpose. As all of the blocks in an I frame are stored as images, thus decoding an I frame reveals a complete picture without dependency on reference frames.

 

For this reason I-frames are also known as key-frames. And they are the only type of frame completely independent of all others.

 

P-frames

P-frames are forward predicted and may use either an I-frame or P-frame as a point of reference. They are encoded from the frame that precedes it. In any video sequence a group of frames will have many of the same images. For example, if you were to watch a news presenter, you will notice that the area (scene) behind the presenter stays almost identical for every frame. So instead of encoding each frame, as a totally new frame (remembering that for an PAL image there could be up to 25 of them) you can exploit the redundancy of each frame by the use of P-frames.

 

Essentially a P-frame is a future frame that determines where a block in the previous frame has moved in it's current P-frame. So instead of spatially encoding the frame (like with JPEG) the P-frame just says: "Hey the block in the previous frame has moved to location (X,Y)", which requires much less data than encoding each frame spatially.

Essentially only the differences between frames are recognised, which is more efficient than recognising the original I-frame.

 

Technically speaking, an P-frame is one which can contain blocks that have been forward predicted via a motion vector from the previous frame by the motion search. It is normally unlikely that all of the blocks in a P-frame can be predicted, and where blocks can not be tracked from the previous frame, an intra-block is used in its place, similar to those found in an I-frame.

 

Because P-frames reconstruct much of the frame by applying motion vectors to the previous frame (ie: motion compensation) they are far less expensive in terms of storage than I-frames. One or more P-frames may follow an I-frame.

 

Therefore a greater ratio of P-frames to I-frames leads to a higher compression ratio.

Advanced Simple Profile (ASP) Settings

 

B-frames (Bi-directional encoding or B-VOP)

B-frames can also forward predict, but do this by choosing the best prediction match among a 2 frames. B-frames are not only coded by using forward predicted frames but also backward predicted frames. Which can either be an I-frame or P-frame.

 

Using B-frames reduces the amount of data needed to code a frame and improves quality more specifically in areas where moving objects reveal hidden areas.

 

All the major Mpeg4 encoding companies are constantly upgrading B-frame encoding techniques by improving how motion estimation is  performed on these frames. And also by improving the precision and quantization modulation.

 

It's worth pointing out that some DVD/Mpeg4 stand-alone players (including the Sigma Xcard) can't handle more that 1no consecutive b-frame at a time. The current version of DivX5.1.x generates 1no b-frame as standard. However, XviD generates 2no consecutive b-frames by default. Which you may have to over-ride/set to 1no.

 

Global Motion Compensation (S-VOP)
GMC helps to improve visual quality and data compression ratios. GMC techniques make use of the common elements of zooming and panning scenes to reduce data requirements and compress the video more effectively.


Try and imagine how in a sequence where a large-scale panning or zooming occurs, the data required to represent each frame in that sequence can be reused to a higher extent. (After all, most of the scene remains the same; it is only panning or zooming across the camera frame.) GMC makes use of this commonality to reduce the amount of data required to encode these types of scenes.
 

Quarter Pel Motion Estimation (Qpel)

As explained in the B-frames summary, data is reduced when the difference between two frames (prediction error) is transmitted instead of the entire image being sent. The difference in a successive frames composition is generally computed on a macro-block-by-macro-block basis (16x16 pels) or on a block by block basis (8x8 pels).

 

For example, a part of an image located in a block at grid location (1,1) may move to grid location (1,2) in the next frame. As you may realize an image in one block will likely need more accuracy than just the ability to move on a limited block by block basis with an accuracy that is limited to an integer pixel unit (1,1).

 

Typically Mpeg4 uses Half Pel (1.5, 1.5) prediction and encoding techniques. However, Quarter Pel (1.25, 1.75) prediction and encoding techniques performs specific filtering on each block to produce a virtual block that is able to represent how the original block should appear if moved by a 1/4 of a pixel unit.

 

It's worth noting that Qpel generally decreases compressibility. Although it can find a better match, it usually doesn't 'correct' this match anyway, because the errors are too small. If there is no good match, Qpel doesn't help much either - the amount of texture data is still similar.

 

It was thought that using this feature would allow the user to produce the same high visual quality at about 20% less file size. But such claims have been revised

 

MPEG Matrices

TBA

Other Settings

 

Lumi masking - as used by XviD
Lumi masking is a first 'psychological' innovation in XviD; it is supposed make use of the fact that the human eye tends to notice encoding errors less if they happen in very dark or very bright parts of the picture. XviD is - in contrast to DivX - capable of using different quantizers for each macro-block. Lumi-masking compresses very dark or bright areas stronger than medium ones. So it will use less bits on some frames in the second pass than in the first pass. The saved bits are of course spent again and that way we gain a bit of quality in the medium-brightness part of the picture. As it is experimental, you may sometimes notice more blocks than when it is disabled.

 

Psychovisual Enhancement - as used by DivX
Using knowledge of how the Human Visual System (HVS) perceives moving video, the psychovisual modelling techniques of DivX Pro reduce data allocation in areas where the human visual system does not notice it, and places that data in places where the human eye is especially sensitive. A unique Psychovisual Complexity Rating (PCR) is created for each frame and used to decide how the data should be properly applied. The result can be up to 20% data compression improvements over certain sequences of video images.

Psychovisual modelling was previously confined to the realm of academic research labs, but DivX 5.0 and DivX Pro 5.0 are the first widely distributed video technologies to make use of this innovative technique.

Chroma Optimiser - as used by XviD
Chroma optimizer reduces PSNR by it's nature. The mathematical deviation to the original picture will get bigger - but the subjective image quality will raise (as mentioned, the 'stair step artefacts' get less.
That said, some people have reported problems with complete colour mismatches

 

Pre-processing - as used by DivX
DivX5 Pro includes special (patent-pending) pre-processing algorithms that have been designed to deal with video noise in the source material. Often referred to as specks, snow, or hair within a video. Snow, is commonly seen when watching analogue television broadcasts.

Generally speaking, noise is a big problem when it comes to compressing video, because a lot of data will be used to capture the video noise that really shouldn't be there in the first place.

The DivX Pro noise pre-processing filter uses digital signal processing techniques to remove the noise from the source material before it is encoded and compressed. Both spatial and temporal filtering is employed to not only remove noise from a given frame, but to ensure the same noise doesn't reappear in subsequent frames.

DivX Pro provides four levels of noise pre-processing, from "Light" to "Extreme," that can be used depending upon the specific need of your source material. Older movies are especially notorious for including video noise, so using the noise pre-processing features of DivX Pro can result in dramatic effects in file size reduction and quality.

 

De-interlacing
Interlaced video is most commonly found when encoding from captured analogue television broadcasts, or content that is created by consumer camcorders. In interlaced video, the video consists of 50/60 "fields" per second rather than the 24 (for film) or 25/30 "frames" per second number that many consumers are familiar with. The 50/60 fields are shown so fast in sequence that the odd and even fields blend together and the human eye doesn't really notice the difference.

 

If you convert interlaced video into non-interlaced (ie progressive) video on a computer, you may find that when the two adjacent fields are put together to create one frame, that the information in those fields might not quite line up, and you get a "smearing" or "tearing" effect that degrades visual quality.
 

In the main, video that is created on computers is progressive. So it really is beneficial to convert 50/60 'field' interlaced material into progressive 25/30 'frames' material.

 

Codec manufacturers use special algorithms to minimize the smearing and tearing effects that could result. But beware, because de-interlacing tools can also be found and used in many external or front end encoding applications (such as MPEG Mediator, VirtualDubMod, Gordian Knot). And if both de-interlacing tools are activated at the same time, serious problems can and will occur.

 

Packed Bitstream

Helps to permit B-frame decoding without delay.

 

When enabled P-frames and B-frames are 'packed together' into one bitstream [I][PB][B][Empty][PB][B][Empty][P]. Packed-bitstream was first introduced into the encoding process with the launch of DivX5.0.1. Xvid offer manual selection.

 

Closed GOV

Closes every 'group of pictures' before opening a new key-frame (I-frame)

Note

The above information has been compiled from a variety of sources, including DivX.com and Bond (Doom9.org) - Many thanks

Last Updated

Mon 09 Nov 04 @ 18:15 - Various changes