I believe I read a breakdown and the video was deemed to be game engine due to the fact that parts of the scene changed to be appropriate to choices made in the game (ie correct clothing depending on what outfits were available or some such and the ability to move the character during some cutscenes).
Also uncompressed 7.1 audio does not end up being small. We tend to think of video/audio file size in terms of what we download ie a divx movie or a dvd vs an mp3.
But if you recall back in the day when we were dealing with uncompressed cdda rips and wav files, the file size got big fast and that was 2 channel. Same holds true today but at 7.1 channels of uncompressed you get a lot of space eaten up fast. I believe DTS Master encodes at about 25MB/sec with current uncompressed movie audio clocking in around 6Mbps (that's just off the top of my head though and could be off).
As for downloading videos of cutscenes, unless they are direct file rips of the original media, a video of a cutscene proves little about the original filesize behind the cutscene. For instance I could take a video capture of an hour of me playing this game
http://91.202.41.234/ and encode it as SD MPEG2 (DVD) You would get multiple GB however obviously the game did not have gigabytes of data in it making that video. Same with the cutscenes from MGS4... a video capture of the action does not tell you anything other than what the bitrate of that type of video compression created on that scene, nothing about the source. The source could have been HD FMV or it could have been procedurally generated or anything in between.