EME 6209 – Video File Formats/Compression

Publishing/Rendering your Videos (Intro to File Formats)

Once you have created your videos, you probably will want to post them somewhere… YouTube and Vine are two choices. If you simply want to use them on your own computer in your classroom, you need to understand file formats. Here are a few pointers to get you started:

  • Uncompressed videos are usually too large to play on computers. An uncompressed video can take up as much as 2 gigabytes per minute. Making it impractical to handle them on most computers purchased for the classroom. Also, you need to know that formatting the video is different if you want to play the video on a television set or on a computer. The file formats are not the same. All editing packages attempt to make this process easier and use menus to choose which format you want. In some cases, if the video was produced properly, it can be played on a television and a computer… that is, if the computer has a good dvd player (like the Macs).
  • Both Microsoft and Apple have their own proprietary output formats. For Windows, it is .wmv files. For Macs, it is usually .mov or .mp4 files. Microsoft wants you to use its Media Player to view the videos, Apple wants you to use QuickTime. Not all computers configured for the classroom have either or both of these players on them. While at home you can simply download the players (they are both free). But at school you may have to get permission to do this or have your tech coordinator do it for you… (this is another story that we could do a whole module on, but I won’t go there).
  • Not all.wmv or .mov files are created equal. We will cover in the file format lesson in more detail, but for now, understand this is is based on the compression routines used to compress and decompress the files to make them smaller. Often you will need to find a plug-in for the video player that recognizes that Codec. There are attempts being made to standardize all f this but it take up to seven years of testing to come to terms with each version of the standard. We are up to version seven now but as of now, only versions 1-4 have been stabilized… So, we are still looking at file incompatibilities..
  • In case you are lost on all of this, simply know this… that hen you do video editing, the raw form of the video you are working on is called the project file. All of the subsidiary files and images, and audio are located in their original resting place. (Just like with PowerPoint for example, the application “points” to the original file but does not embed the actual video into the ppt file)… So, what you have to do to complete the project for viewing is to ‘render’ the project to its final compressed version (either .wmv , .mov etc). With this process the project is compressed and all subsidiary, linked files are actually embedded into one ‘flat’ file. When you turn in your completed project, I am going to be asking for the finalized file and NOT the un-rendered project file…
  • So when creating your projects make sure you ‘finish the job’ by rendering it according to the specifications…

File Management

File management remains as one of the most confusing issues that face those involved with multimedia production. How often has is happened that you save a video, for example, as .mov file thinking it will be playable on all media players that accept .mov files only to find out that it does not play on someone else’s machine? The guiding rule is that not all formats are created equal. This is because of the CODEC (compression-decompression routine). In order fo you to become familiar with all of this, take a look at the following links. Much of it repeats itself but each one present a different ‘wrinkle’ on the subject. Afterwards, I present some notes and a vieo I made for another production course I taught (which explains the lesson 9B designation). Rather than recreate it I decided to leave it the way is is because I think it presents a pretty good, relatively quick overview of how mpg processing works. It may be a a lot more about this subject than what you will need but after you absorb the content, you will have a good consolidated overview of the topic of video compression.

Compression Theory

This section. is a set of notes that explains the motivation and conceptual theory behind video compression.

Motivation

The need to compress video can be boiled down into to three specifics:

  • Uncompressed video and audio data are huge. In HDTV, the bit rate easily exceeds 1 Gigabyte per second. This translates into big problems for storage and network communications:
    • For uncompressed captured video into digital format, sizes are also big.. 1 second of captured video can equal 27 megabytes (mb)
    • The compression ratio of the so-called ‘lossless’ methods (e.g., Huffman, Arithmetic, LZW, etc.) is not high enough for image and video compression, due to:
  • bandwidth constraints
    • the distribution of pixel values is relatively flat.. this is because most areas of a video are different than others.. the conceptual design of compression is that you only hav to send actual pixel values if they change..
  • humans tolerate a certain amount of loss (for example humans do not notice subtle changes in color tones), which can be predicted before it is noticed.

Video compression is merely compacting the digitize video into a smaller space.

Choices of Codecs

Current Standards… yes the industry is attempting to standardize

  • Motion – JPEG
  • MPEG I
  • MPEG II (even though there exists more recent formats, this is the latest ‘standardized’ format)

Products (the following are CODECs, NOT software applications) (and most of them you probably haven’t even heard of):

  • Captain Crunch Media Vision
  • Cinepak
  • DVI-RTL (2)
  • Indeo
  • Pro-Frac
  • SoftVideo
  • Ultimotion
  • Video Apple
  • Video Cube
  • Sorenson
  • MotiVE Media Vision
  • h.264

On the other hand, QuickTime is a software application that uses one or more of the above CODECs (currently, h.264.. Like everyone else, Apple is developing its own set of CODEC in hope that it will become standard)

Techniques/Strategies

  • Source Coding = encoding the output of an information source into a format that can be transmitted digitally to a receiver as a series of code words such that the average length of the code words is as small as possible.
  • Data Compression = reduction in redundancy …made possible and made necessary by digital transformation (this is what, in effect, jpeg does).

Digitizing, by itself can add to file sizes:

  • a 4MHz channel (that is the space allocated for each television channel) is capable of transmitting 8 million analog samples of a picture per second.The size of a digital representation of same analog image is increased 8 times. (based on one bit representation)

Two alternative strategies have been developed

1- One approach is to throw away “unnecessary” information from individual frames (called intra-frame editing). This is known as ‘lossy’ compression

This can be done at the camera level…

For example, the Digital Betacam reduces data by half (2:1 ratio). A DV camera compresses at 5:1.Produces clean low-noise signal where every frame is the original signal and is stored on tape or disk so we can easily isolate frames when needed to edit the material.

The above is an example of Motion- Jpeg compression.. (very similar technique as with still photos)

2- Then we have Mpeg (motion jpeg)… this compression method looks at adjacent frames to see which pixels are changing (within each frame)… while visual data may be lost, its advantage is that it is also designed to synchronize with the audio much better.

Mpeg has evolved into Mpg and is becoming the ‘defacto’ standard compression CODEC (view the video below for more details…)

Required compression ratios for package television via commercial channels.

Look at the chart below. For each of the channel types, the chart shows the required compression ratio needed for that channel to properly handle the selected service at a lossless level. For example, for a PCLan to service an HDTV broadcast, a 31,000:1 compression ratio would have to be attained, film quality video would require a compression of 76,000:1, etc. For purposes of this example, we considered cable modems and DSL (i.e., Centrurylink) to be ‘virtually’ equal at the top end, when, in fact, there might be differences… this chart is for illustrative purposes only….
[one_third].[/one_third][one_third]

NTSC TV HDTV FilmQuality
Channel
Bit Rate
168 Mb/s
933Mb/s
2300 Mb/s
PC Local Lan
30 kb/s
5,600:1
31,000:1
76,000:1
Modems
56 kb/s
3,000:1
17,000:1
41,000:1
ISDN
64-144 kb/s
1,166:1
6,400:1
16,000:1
Cable Modems
10-20 Mb/s
30:1
150:1
200:1
Electrical Outlets
varies
varies
varies
varies
T-1/DSL
1.5-10 Mb/s
112:1
622:1
230:1
T-3
42Mb/s
17:1
93:1
54:1
Fiber Optic
200 Mb/s
1:1
5:1
11:1

[/one_third][one_third_last].[/one_third_last]

How to decide which video CODEC is necessary…

You first need to define the Distribution Medium and Playback Platform:

  • Know the CODEC’s availability on the multiple platforms you plan to distribute your software.
  • Know the CODEC’s ability to adapt the synchronized playback speed to the available hardware without user interference.
  • Weigh the developer issues (is a slower compression ok?)
  • Know the source of the video and whether it has previously been compressed. Yes, compressing an already compressed video can result in a LARGER video!)
  • Know the type of video you are producing.. how much motion, color?, image size, how much activity?, sound? camera moves?

Three criteria involved in selecting a specific CODEC

  1. the Compression Level,
  2. the Quality of the Compressed Video
  3. the Compression/Decompression Speed

Technical discussion: There are two techniques you can use.. intra-frame (within) and inter-frame (between).

How the two alternative compression approaches work (understanding the concepts below is key to your understanding of how compression works in general):

  • Intra-frame Compression

– takes advantage of redundancy with picture (for example a picture of a sky has a lot of blue in it)

– takes advantage of human limitations (humans notice change in luminance ten times more readily than changes in color).

  • Inter-frame Compression

– takes advantage of redundancy in sequence of pictures at 30 frames per second, each subsequent frame is going to appear a lot like the previous one..

– also uses intra-frame techniques

Two important Compression Terms:

  1. Lossless = allowing exact recovery of the source data
  2. Entropy.. which is the smallest average code word length (i.e., = smallest predictable size) without substantially changing the context of what is being shown… (visual language of the moving image…)

Notes/Review:

Entropy is a measure of the smallest average information content per source output unit (i.e. when bits/pixel = 1:1 ratio)

– In order to accomplish an equal broadcast quality level (based on how analog samples are transmitted), it takes a 4:1 compression to transmit a monochrome digital signa.

– Color requires an additional 50-200%

Theoretical Review

Compression Principles

1. Data redundancy – sample values are not entirely independent… neighboring values are somehow correlated (both audio & video).

Because of this, certain amount of predictive coding can be applied.

2. Voice: there is a lot of dead space (silence removal).

3. Images: neighboring samples are normally similar (spatial redundancies removed thru transform coding)

4. Video: the sequence of images are normally similar

Two basic methods:

  • Lossless – preserves all data, but is inefficient.
  • Lossy – some data is eliminated. But as most images contain more data than what the eye or ear can discern, this can be unnoticeable. However, as file sizes get smaller, loss can be detected. Lossy is better suited than lossless for delivery on movable storage and over networks.

Methodology:

  • Spatial – applied to a single frame, independently of any surrounding frames. (intra frame) (jpeg)
  • Temporal – identifies differences between frames and stores only those differences (inter frame). Also uses intraframe technology to establish keyframes.
  • Keyframe – reference frame for the intra-frames that follow. (Most editors’ “scrub” controls can only jump to keyframes). Note: if you increase the number of keyframes = increased files sizes

Factors to be considered:

Handling color

Compression is handled through space reduction (two factors.. humans perceive each of these differently):

– Luminance (brightness)

– Color (chrominance)

These are accounted for separately because the human eye notices differences in luminance at a rate almost 10 tens higher than changes in color. (For 16 bit color, the image is divided into 16×16 blocks, for 32 bit 32×32, etc.). That is why you see component HD cables on the back of your TV set (red, blue & green and one additional cable for luminance)

Terms

Time to encode

Symmetric = same amount of time to encode and decode

Asymmetric = encoding is not done in real time. (based on frame rate, size, and data rate of video)

Factors in determining compression time are:

  • Frame size
  • frame rate
  • encode rate (Variable bit rate takes longer)

Data Rate

– Should be maximized for the targeted delivery channel.

  • CD-ROM = 200kb/sec
  • Internet = 1.5 to 50 kb/sec
  • Higher Data Rate = higher quality (eg. formula = HxWxFPS/48000) ( sb between ½ and double result above)

– Also affected by amount of action within the frame. –Trick is to reach this ceiling limit with a lower rate so compression is more efficient

Contrast Sensitivity is handled through space reduction.

–Luminance (brightness)

–Color (chrominance)

Humans tend to notice differences in luminance more readily than they do chroma..

Humans tolerate more loss with color than with monochrome.

–The eye posses a lower sensitivity to high and low spatial frequencies than mid-frequencies.

Implication is that perfect fidelity of edge contours can be sacrificed to some extent (these are high spatial frequency brightness transitions).

Humans can detect differences in luminance intensity levels down to approx 1% of peak white w/in a scene. (= 100:1 contrast ratio)

–Therefore the math behind compression does not have to be linear.

–Also affected by viewing ratio. (how far away from the screen the viewer normally sits)

Delivery Mechanism

What is your viewing audience going to use to playback the video?

–CD-ROM?

–Internet/Intranet?

–Live?

Power/performance of playback machine

  • Lower end machines cannot handle higher data rates.
  • Factors are frame rate, data rate and frame size

Summary: Choosing a CODEC

General considerations

  • Method used for delivery
  • Audience’s configuration
  • Data Rate – should be maximized for the targeted delivery channel.

Also affected by amount of action within the frame. The trick is to reach this ceiling limit with a lower rate so compression is more efficient.

Also need to take into consideration power/performance of playback machine.. Lower end machines cannot handle higher data rates.

In summary: Factors are frame rate, data rate and frame size, #keyframes

PLUS:

Delivery mechanism – what is your viewing audience going to use to playback the video?

Performance Measurements:

Compression Ratio:

1.  Ratio between the original data and data after compression.

–A higher ratios not always desired… it depends on quality of reconstructed data.

2.  Compression speed and complexity are also considerations.. Making asymmetric CODECs

– More desirable in some cases (live feed vs QuickTime (archived video it does not matter/interfere with viewing)

Summary of requirements for Playback/File-size

  • Data rate – needs to be 70 Kilobytes per second or less for lower end machines
  • Frame size – 320×240 or less recommended
  • Frame rate – 15 frames per second.. less for low-end/slow machines
  • Doubling – ~full screen requires larger file size
  • number of Keyframes ( we will cover this later)
  • CPU Alternatives – allows you to produce several versions at different rates
  • Playback scalability – drops every other frame
  • Number of Transitions
  • Amount of action w/in a frame
  • z-axis (look this term up if you do not know it

Methods can also vary by type of video

For Training videos (where there is usually more action than two talking heads)

  • Usually compress very well at lower data rates because of lower am’t of action.
  • Can compress at higher data rates for CD-ROM and broadband

The Video Lesson Begins here.. we will now look at mpeg coding

The following are notes extracted from the video for your review

The theory behind Mpeg:

In order to understand the concepts discussed in this video, you need to understand this one underlying principle:

Compression yields even more compression due to various redundancy/predictive methods

Standards?

While the codec products we have been describing (Cinepak, Indeo, Sorensen) are widely available, none of them are any more than ‘de-facto’ standards that were developed by private companies in hopes that they would become widely used.

Several international standards for compressed digital video are grouped together under the name MPEG were developed by committees of experts (all of whom worked for these private comapnies and participated in order to 'lobby' for their specs to be included).

MPEG Choices

  • Motion -JPEG
  • MPEG-I (VHS to CD)
  • MPEG-2 (Digital Video) (DVD)
  • MPEG-3 (HDTV)
  • MPEG-4 (included more audio) (latest approved standard)
  • MPEG-5 better compression

We are up to MPEG-7 (interactivity)…

Only standards ever issued to date are 1,2, & 4 (it takes seven years to formalize a standard.. that is why even all mpg 5 videos are not created equally.. developers use the guidelines that are issued the play around to tweak them...

Uses prediction to exploit potential redundancy to yield statistical entropy

  1. Separate the pixel values into a limited number of data clusters (e.g., pixels whose color is clustered near sky blue or grass green or flesh tone or the color of clothing in the image, etc.).
  2. Send the average color of each cluster and an identifying number for each cluster as (29)side information.
  3. Transmit, for each pixel: (this is where prediction comes into play)

The number of the average cluster color that it is close to, and Its difference from that average cluster color.

This method will yield approx 2:1 lossless reduction

Simple Differential Predictive Coding

Two values are sent:

  • Predicted pixel value (the value of the preceding pixel).. prediction assumes nothing is changed unless otherwise indicated.
  • Prediction error (difference between the predicted pixel and the actual pixel)

Because these are sent as entropy codes, there is reduction… recall, these codes do not have the 4:1 requirement, as they are sent as side information.. side information is not a graphical format, but, rather, information such as pointers or algorithms that aide in the reconstruction of the image during decompression.. therefore, cutting done on bandwidth (i.e., file size) requirements.

Uses Frame Differential Coding

  • Prediction from a previous video frame
  • Requires storage of a frame of video in the encoder for comparison
  • Good for still images

Motion compensated prediction

notice that the word 'prediction' is often used here?

  • Compares the present pixel to the location of the same object in the previous frame
  • Estimates the motion to make the prediction
  • Sends motion vector and prediction error as (33)side information.

MPEG Compression Technology

Three types of frames: I, P, and B

– I-frames are intra-frame encoded – P-frames use forward prediction from I or P frames

– B-frames can use forward and backward prediction from I or P frames

Inter-frame Techniques

Simplest

– treats each image independently

Differences

- View each frame as an adaptation of previous frame – store changes in color at each pixel – the number of changes can be large even if the changes themselves aren't large.

Motion Compensation.

- Indicate motion of camera – store error pixels – This technique won't compensate for characters moving within a scene.

Block Motion Compensation.

- Break scene into blocks – indicate motion of each block from last scene

Techniques used when a scene changes

- Analyze newly uncovered information due to object motion across a background, or at the edges of a panned scene.

To handle this, MPEG uses I-frames for start-up frame (sent 2x /second)

Sends B-frame to reduce the data required to send uncovered information

The frame order is changed to accomplish this:

Source order and encoder input order:

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) B(12) I(13) Encoding order and order in the coded bit stream:

I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11) B(12) Decoder output order and display order (same as input):

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) B(12) I(13)

Intra-frame Techniques

The math is called:

Discrete Cosine Transform (DCT coefficients)

- Converts picture into frequency space

- Can judge which information is important .. is run to organize the redundancy in the spatial directions then Huffman coded (i.e. duplicates are removed) (often results in lossy compressions because some of the calculations are estimated

Block motion vectors are Huffman encoded (another math coefficient)

B-frames are most useful for improved signal

In other words,  for within frame compression, the MPEG CODEC looks for a close match to that block in a previous or future frame (there are backward prediction modes where later frames are sent first to allow interpolating between frames).

The DCT coefficients (of either the actual data, or the difference between this block and the close match) are quantized and sub-sampled, which means that you divide them by some value to drop bits off the bottom end. This is possible because a human's capacity to see things is limited.

MPEG advantages

  • Has become an International standard
  • Allows Inter-frame comparisons
  • Predicts redundant traits
  • More universal

Benefits of MPEG (the last 'standard' version is MPEG-4 (.mp4)

  • MPEG-I could achieve 30 fps on 320 X 240 windows when played back on boards that cost less than $500.
  • The key question was whether developers would move to MPEG I or stick with existing software. Most people agree that MPEG-I playback looked better and it added the advantage of compressing audio.

In summary:

  • JPEG and MPEG are not products… only standardization techniques… JPEG is symmetrical...
  • QuickTime is a product that incorporates standardization techniques into it.
Review!

While there is really no assignment attached to this lesson, I have a series of review questions you can utilize to test your knowledge of the content in this lesson.

Comments are closed.