Storing user files: the BLOB vs filesystem debate comes to GTPlanet

  • Thread starter Thread starter GilesGuthrie
  • 18 comments
  • 1,381 views

GilesGuthrie

Staff Emeritus
Messages
11,038
United Kingdom
Edinburgh, UK
Messages
CMDRTheDarkLord
This comes from work that Pako and I are doing on the same code blocks for independent projects.

Both of us have a requirement to take files from users, and store them within our web sites for later retrieval. We're both using PHP and MySQL, server DB space and disk space are not a problem for us at the moment, but as usual, we're concerned about the future scaling implications of decisions that we take now...

The core question is:
Is it better to take the file and store it inside a database as BLOB data, or is it better to copy the file to the web server's filesystem, and store pointers to that file in the database?

The arguments for BLOB storage are:
- ease of addition
- ease of retrieval
- no need to worry about filenames
- no need to worry about directory access permissions

The arguments for filesystem storage are:
- Performance scales better
- No dedicated code required to send file to browser
- Large amounts of files will not affect total database size

Now, before everyone jumps down my throat, I know that both of these sets of advantages can be worked around, but I'm looking to get a handle on what people feel is the best way of going about this. I'd especially like to hear from people who are running databases with large amounts of BLOB data, and/or people who are using the filesystem storage model, and how that affects things such as backup.

This is your mission for the day! :)
 
I think there's a couple 'o factors needed in order to find what will work for you.

  • How many files will there be (A guestimate).
  • How big files will be.
  • Are the files likely to be stored forever or will they be removed xx days after last access. (DB Size)
  • Preferred backup method / needed window vs. time offline.

And so the list goes on..
 
Shannon
Yes, I know this. However, the more people who start contributing to this forum, the greater the chance of someone deciding to have a go.

Flerbizky
Speaking for myself (Pako will need to speak up if he's going another way):

  1. 500 - 1500
  2. One set no bigger than 35K, one set around the 150-200K mark
  3. Forever
  4. Usually a SQL dump, so can be done online, but needs to be transferred across the net.
    [/list=1]
 
Originally posted by GilesGuthrie
Flerbizky
Speaking for myself (Pako will need to speak up if he's going another way):

  1. 500 - 1500
  2. One set no bigger than 35K, one set around the 150-200K mark
  3. Forever
  4. Usually a SQL dump, so can be done online, but needs to be transferred across the net.
    [/list=1]

  1. I'll give it a think while shopping with the GF and get back to you when I get home in a few hours..
 
Originally posted by risingson77
So it's raw data that exists outside of the server's filesystem?

Yes. If you open a JPEG file in Notepad, you get a stream of characters. This is what goes into the database into a BLOB field.

Then you create a PHP file that pulls the data out of the BLOB field, combines it with the relevant headers and fires it at the browser. Sounds complex, but PHP has all the functions built in, so it isn't.
 
In my readings, something to really consider is the max file size allowed. Take a thumbnail, or small image file <100k, not a big deal in my estimation to have the data stored into a BLOB field. Now consider WAV or MP3 files or other large format files...., I could see where these types of large data files 'clogging' up the database. Another thing to consider, when retrieving this large information, the download process of a file of this size keeps the database open until the download is complete, whereas, if the file were stored into a accessable directory (not at root, ;)) then the database would only need to remain open until the instruction is sent to the server to get the specified file from filepath resident in the DB.

Things I like about BLOB's is that the organization of the files seem to be a lot more logical as they are 'tied' to a specific record.

Something to consider, I read somewhere that some systems have a files size limit of 2gb's.... If your considering using BLOB's, I would find out if this limitation effects you.
 
Originally posted by Pako
Things I like about BLOB's is that the organization of the files seem to be a lot more logical as they are 'tied' to a specific record.

Something to consider, I read somewhere that some systems have a files size limit of 2gb's.... If your considering using BLOB's, I would find out if this limitation effects you.
By wildly assuming Giles' MySQL is located on a linux with 2.4.x or newer kernel, a 4Gb file should be of no problem. One of the DV tapes we recorded in Seattle is currently on my Samba server preserving a healthy 6.7Gb..

And with Giles' specified number of files + filesizes, a little headwork and it'll come to 500Megs tops so filesize is the least of his headaches right now ;)
 
Originally posted by Concept
Giles and Pako, what are your IQ's?

I'm not saying Giles and Pako don't have high IQs (because they obviously do :) ), but most of this stuff is logical thinking combined with plain ol' rote leaning. If you do it enough, you'll get good.
 
Does BLOB make the page faster? Since it doesn't have to convert the file format between the server and the home PC?

Or am I way off here? :D
 
Originally posted by Flerbizky
I think there's a couple 'o factors needed in order to find what will work for you.

  • How many files will there be (A guestimate).
  • How big files will be.
  • Are the files likely to be stored forever or will they be removed xx days after last access. (DB Size)
  • Preferred backup method / needed window vs. time offline.

And so the list goes on..

For me, one project will be so small and insignificant, that going the Blob way is the easier way to go...

However, I'm currently working on a concept that would be as follows:

  • 400+/year
  • On average, 5-8 mbs (Streaming Audio in some instances)
  • Stored forever
  • Undecided. In the past, for backing up MsSQL DB's, it seems to be that I had to actually stop the service in order for it to back, until I got a new backup utility. I haven't gotten far enough into the game to know what my options are with MySQL to know what backup conventions are available.
 
Originally posted by Pako
For me, one project will be so small and insignificant, that going the Blob way is the easier way to go...

However, I'm currently working on a concept that would be as follows:

  • 400+/year
  • On average, 5-8 mbs (Streaming Audio in some instances)
  • Stored forever
  • Undecided. In the past, for backing up MsSQL DB's, it seems to be that I had to actually stop the service in order for it to back, until I got a new backup utility. I haven't gotten far enough into the game to know what my options are with MySQL to know what backup conventions are available.
For the last project.. Forget about BLOBs... Files and pointers would be the way to go on that one. Storing say 1500 6Mb files in a (or more) DB file (s) would make just finding the needed file IN the file slower that ketchup. Not to mention backup. And what if a disk ****s itself. Or God forbid the DB file corrupted :irked:
 
If you're looking to store a ton of images, i would not reccoment BLOB. I was going to go that way when i was creating an administration script for a gaming website (which sadly, never launched) and i too realized that it would slow down the whole data retrieval process alot.

Storing images in a file system is much more trustworthy, and you wont lose your files if the database is corrupted. Also, when backing up your database, it wont take hours like it would with BLOBed images because of the excessive size of the database.

Pako, you mentioned streaming audio? Imagine how much load that would put on the database itself if you had a ton of users online streaming that audio. It'd be horrible, and it would possibly cause data retrieval problems due to the lag caused by all that streaming data straight from the database.

This is just my personal opinion, but i say filesystem is the way to go.
 
Seems i killed this thread with that post. I was just browsing through this forum to see what i missed, and i just thought of something.

If you wanted to stream audio, but you wanted to store the audio in the database (still causing excessive database size, but whatever floats your boat), you could have a session start whenever a user wants to listen, and then have a script that reads the data from the database immediately, and then takes that data and stores it as a temporary audio file on the server, with the name of the user's session. This way, each user would technically be streaming a different file, and there would be no load whatsoever on the database when someone wants to listen in.

I don't know what made me bring this old thread up, but i just thought i'd throw my idea in.

As far as having problems with filenames, you could just store the filename (edited or not) in its own column and then store that file with the same filename oin a specified folder in your filesystem.

Though, in my opinion, i think that file system still prevails. :)
 

Latest Posts

Back