Multipart Uploads: Handling Files without Regret

Multipart uploads sound simple. Then you hit production and you get support tickets, full disks, broken sessions, and that one browser that thinks the spec is a suggestion. This is a field guide from the servlet trenches so you can ship file uploads without regret.

Problem framing

Right now everyone is talking about Ajax and progress bars. Gmail makes it look smooth and we all want that magic. Under the hood it is still the same multipart form data over plain old HTTP. On the server we have Servlet 2.4 everywhere, some early 2.5 bits brewing, and the usual suspects like Tomcat 5.5 behind Apache. The gotcha list is old but still bites.

The core problem is balance. You need to protect memory, disk, and app thread count while giving users clear feedback. You also need to pick a library and stick to it. In Java land the workhorse is Commons FileUpload. Struts and Spring wrap it nicely, but the knobs still matter. Size limits, temp directories, and thresholds are the difference between a smooth day and a pager at 3am.

Three case walkthrough

Case one: tiny avatars

Think images under one meg. Keep this path simple. Parse multipart, cap request size at the connector and at the library, and store the file on disk outside web root. Rename it to a random name and keep the original name and content type as metadata. For speed set a small in memory threshold so thumbnails do not touch disk. In FileUpload that means a small threshold in the factory and a strict max file size. Add a whitelist for image types so users do not sneak a script.

Case two: real world documents

Docs in the ten to fifty meg range are common in intranet apps. Here you want to stream to disk fast and keep the heap clean. Use the streaming API so the servlet reads the request in chunks. Point temp storage to a partition with space and monitor it. After the upload finishes, move the file to final storage in a single rename to avoid partial files. If you do virus checks, run them after the move and quarantine on failure. Pair this with a simple progress endpoint that reads bytes read so the UI can show a bar. It will not be perfect on every browser, but your users will feel heard.

Case three: big media and archives

When users send a two hundred meg video, try to keep your app responsive. Put a strict max at the front door with web server limits and in Tomcat with max post size. Stream from socket to disk, never to memory. Offload any heavy post processing to a queue so the request returns fast. Show a receipt page with a tracking id and poll a status endpoint. If you run a cluster, use sticky sessions or store progress in a shared cache so the bar does not jump backward when a node shift happens.

Objections and replies

We already use Struts or Spring, can we skip all this Yes and no. The framework saves you from parsing. You still must set max file size, max request size, temp path, and cleanup. Those are on you.

SSL makes it safe SSL protects the wire. You still need size caps, type checks, random file names, and storage outside web root. Add a daily job that deletes stale temp files. Attackers love leftovers.

We will show a progress bar like Gmail You can show a bar, but be honest. Real progress needs server side counters. Fake bars that just animate are fine for tiny files. For big files, read bytes on the server and expose that over a light JSON endpoint tied to the user session.

Why not store files in the database It can work. Files in the database make backups simple and permissions clear. Files on disk are cheap and fast to serve. Pick one based on your ops story, not taste. If you store in the database, stream in and stream out. Never slurp the whole thing in memory.

Action oriented close

Pick one library. Commons FileUpload is fine. Learn its knobs.
Set max request size in the web server and the container. Double it nowhere.
Set in memory threshold low. Stream big files to disk from the start.
Write to a temp folder, then move to final storage. One atomic step.
Rename uploads to a random name. Keep metadata in your database.
Whitelist content types and file extensions. Reject everything else.
Add a janitor job that deletes temp files older than a day.
Expose a simple progress endpoint for the UI. Tie it to the session.
Test with a slow link and a browser mix. Firefox 1.5 and IE 6 both need love.
Log the request id, user, file size, and storage path. You will thank yourself.

Ship uploads like this and you get fewer surprises, fewer angry emails, and more trust. It is not flashy, but it is solid, and solid wins.

General Software Software Engineering