The Privacy Cost of File Format Conversion: What Metadata Survives and What Gets Stripped

A format-by-format analysis of what hidden metadata survives file conversion and what gets stripped — covering images, PDFs, documents, and video. Know exactly what data travels with your converted files.

Published March 29, 2026 · Updated March 29, 2026

When you convert a file from one format to another, you're thinking about the visible content — the image, the text, the video. Will it look right? Will it fit the size requirements? Will the recipient be able to open it?

What most people don't think about is what invisible data travels along with that content. Every file format carries metadata — information embedded in the file beyond what you can see. Some of this metadata is harmless: the file's creation date, the software that produced it. Some of it is deeply personal: GPS coordinates showing where a photo was taken, revision history showing deleted text in a document, device serial numbers identifying the specific hardware that created the file.

File format conversion can either strip this metadata clean or carry it through untouched, depending on the format, the converter, and the processing method. Understanding what survives and what gets removed is essential knowledge for anyone who cares about the data they're sharing alongside their files.

How Metadata Works in File Formats

Before diving into specific conversions, it helps to understand how metadata is structured. Different file formats store metadata in different ways, and this structure determines what happens during conversion.

Image metadata: EXIF, IPTC, and XMP

Digital images can carry three overlapping metadata systems:

EXIF (Exchangeable Image File Format) is the most common and the most privacy-relevant. Created by the camera or smartphone at capture time, EXIF data typically includes GPS coordinates, timestamp, camera make and model, lens information, exposure settings, orientation, and a thumbnail preview of the image. EXIF is standard in JPEG and TIFF files.

IPTC (International Press Telecommunications Council) metadata is used primarily in professional photography and publishing. It includes fields for caption, copyright, creator, keywords, and location descriptions. IPTC data is typically added intentionally by photographers or editors, not automatically by cameras.

XMP (Extensible Metadata Platform) is Adobe's metadata standard, used across many file types. It's XML-based and extensible, meaning applications can store arbitrary data. Photoshop, Lightroom, and other Adobe tools write extensive XMP metadata including editing history, color profiles, and application settings.

Document metadata: Properties and revision tracking

Office documents (DOCX, XLSX, PPTX) store metadata in two places:

Document properties include author name, organization, creation date, modification date, last saved by, total editing time, and application version. These are stored in XML files within the document's ZIP container.

Revision tracking is the more dangerous category. When Track Changes is enabled in Word, every insertion, deletion, and formatting change is recorded — including the original text before edits. Comments, annotations, and document comparison data are also stored. This data persists even if Track Changes is turned off, until it's explicitly accepted or removed.

PDF metadata: A deceptive simplicity

PDFs appear to be simple — a fixed layout of text and images. But they can carry substantial metadata:

Document info dictionary includes title, author, subject, keywords, creator application, producer, and creation/modification dates.

XMP metadata stream provides a richer, extensible metadata container similar to image XMP.

Embedded content is the often-overlooked category. PDFs can contain JavaScript, embedded files (attachments), form field data, digital signatures with signer identity information, bookmarks, annotations with author names, and even hidden text layers that are invisible but selectable.

Video metadata: Container and stream data

Video files store metadata at two levels. Container metadata (in the MP4, MKV, or WebM container) can include title, creation time, GPS coordinates (common in smartphone video), camera model, and custom tags. Stream metadata includes codec information, encoding settings, and technical parameters. Smartphone-recorded videos often carry GPS tracks showing the device's location throughout the recording.

Image Conversions: Format-by-Format Analysis

Image conversion is where metadata behavior varies most dramatically, and where the privacy implications are highest because of EXIF GPS data.

JPEG to PNG

What gets stripped: This depends entirely on the processing method.

If the converter uses the HTML5 Canvas API — which is how browser-based tools like Fileza process images — all EXIF, IPTC, and XMP metadata is stripped. The Canvas API works by decoding the JPEG into raw pixel data, drawing those pixels onto a canvas, and then encoding the canvas contents as PNG. The canvas has no mechanism for carrying over metadata from the source. The result is a clean PNG containing only pixel data, color profile information, and basic PNG structural metadata.

If the converter uses a direct format translation library that explicitly preserves metadata, EXIF data can be re-encoded into the PNG's tEXt, iTXt, or zTXt chunks. Some desktop tools (like ImageMagick with default settings) do this.

Privacy impact: Canvas-based conversion effectively sanitizes the image. Direct translation may preserve GPS coordinates and other sensitive EXIF fields.

PNG to JPEG

What gets stripped: PNG files typically carry less sensitive metadata than JPEGs (no EXIF GPS data by default), but they can contain tEXt chunks with creation software, author information, and other custom fields. Converting through Canvas strips all of this and produces a clean JPEG with only the basic JFIF header.

What gets added: The JPEG encoder may add its own metadata — typically just the software identifier and basic JFIF structure. No personal information is added.

Privacy impact: Low risk in most cases, since PNGs rarely contain GPS data. The conversion cleans up whatever metadata exists.

Any format to WebP

What gets stripped: WebP is a relatively metadata-light format. When produced through Canvas API processing, WebP output contains minimal metadata — no EXIF, no IPTC, no XMP. Some encoders can embed XMP and EXIF into WebP files (the format supports it via RIFF chunks), but Canvas-based conversion doesn't do this.

Privacy impact: WebP through Canvas is one of the cleanest conversions for metadata removal.

HEIC to JPEG or PNG

What gets stripped: HEIC files (Apple's default photo format) carry full EXIF metadata including GPS coordinates, timestamps, and device information. The conversion behavior depends on the decoding method:

  • Browser native decoding (Safari, Chrome 118+) followed by Canvas rendering strips all EXIF data.
  • heic2any library decodes to raw pixel data and re-encodes, stripping metadata.
  • FFmpeg WASM may or may not preserve metadata depending on the command flags used. The -map_metadata -1 flag explicitly strips it.

Privacy impact: HEIC photos from iPhones almost always contain precise GPS locations. Conversion through a Canvas-based pipeline is an effective way to strip this data before sharing.

Image resizing and its metadata effects

Resizing an image has different metadata implications depending on the tool:

Canvas-based resizing (as in browser-based tools) creates a new pixel buffer at the target dimensions. All metadata from the source is discarded. This is true regardless of whether you're also changing format — even resizing a JPEG and saving it as JPEG through Canvas strips EXIF data.

EXIF-aware resizing (Photoshop, Lightroom, many desktop tools) preserves metadata and updates the dimension fields. The GPS coordinates, timestamps, and other personal data remain intact.

Command-line tools vary. ImageMagick with -strip flag removes metadata; without it, metadata is preserved. FFmpeg's image processing preserves metadata by default unless explicitly configured otherwise.

The takeaway: if your goal is to remove metadata while resizing, use a tool that processes through a Canvas pipeline or explicitly strips metadata.

Document Conversions: What Survives the PDF Journey

Converting documents to PDF is the most common document conversion, and it has significant metadata implications.

Word (DOCX) to PDF

What gets stripped:

  • Tracked changes — The revision history does not appear in the PDF. Insertions and deletions are rendered in their final (accepted) state.
  • Comments — Depending on export settings, comments may be stripped entirely or rendered as PDF annotations. Microsoft Word's "Save as PDF" dialog has an option to include or exclude comments.
  • Internal bookmarks and cross-references — These are converted to PDF bookmarks or links, not stripped.
  • Embedded macros — VBA macros in Word documents are not carried over to PDF.

What survives:

  • Author name — The document's author field typically carries over to the PDF's metadata.
  • Creation and modification dates — These persist in the PDF info dictionary.
  • Software identifier — The PDF will identify that it was created by Microsoft Word.
  • Hidden text — Text formatted as hidden in Word may or may not appear in the PDF depending on the export method. Some converters render hidden text; others skip it.
  • Hyperlinks — All links in the document are preserved as clickable links in the PDF.
  • Embedded fonts — Font data is embedded, which can include font metadata identifying the font license holder.

Privacy impact: The most dangerous items — tracked changes and comments — are typically removed. But the author name, software version, and dates persist. For sensitive documents, inspect the PDF properties after conversion and strip remaining metadata.

Excel (XLSX) to PDF

What gets stripped:

  • Hidden rows, columns, and sheets — Only the visible, printed content appears in the PDF.
  • Formulas — Cell formulas are replaced by their calculated values. The PDF shows numbers, not the underlying calculations.
  • Cell comments/notes — Typically not included in PDF output unless print settings specify otherwise.
  • Data connections — Links to external data sources are not preserved.

What survives:

  • Author and document properties — Same as Word, the author name and dates carry over.
  • Named ranges and print titles — These affect layout but aren't exposed as metadata.

Privacy impact: The removal of formulas and hidden sheets is the most significant privacy benefit. A spreadsheet that contains a hidden sheet with employee salary data will not include that sheet in the PDF, assuming the sheet is hidden (not just out of the current view).

PowerPoint (PPTX) to PDF

What gets stripped:

  • Speaker notes — Not included in standard PDF export (though there's usually an option to include them).
  • Animations and transitions — These are dynamic effects that don't exist in PDF.
  • Embedded audio/video — Not preserved in PDF.

What survives:

  • Author and properties — Persistent across the conversion.
  • Slide content — All visible text, images, and shapes are rendered.
  • Hyperlinks — Preserved as clickable links.

Privacy impact: Speaker notes are the primary concern. Presenters often include confidential talking points, internal commentary, or audience-specific notes. Verify export settings to ensure notes are excluded.

PDF Conversions: The Hidden Layers

PDFs themselves carry more hidden data than most people realize, and converting from PDF to other formats has variable metadata outcomes.

PDF to Image (JPEG, PNG)

What gets stripped:

  • Selectable text — Rasterizing a PDF converts all content to pixels. Text is no longer selectable or searchable. Hidden text layers are effectively destroyed.
  • Hyperlinks — Links become part of the pixel image and are no longer clickable.
  • JavaScript and interactive elements — Eliminated entirely.
  • Embedded files — PDF attachments are not included in the rasterized output.
  • Annotations and comments — Rendered visually but no longer interactive or attributed.
  • Form field data — Rendered as static content.

What survives:

  • The visual content of the PDF, including any visible watermarks, stamps, or overlays.
  • If the image is saved with metadata preservation, the creation software and dates may appear in the image's metadata. Canvas-based rendering strips this.

Privacy impact: PDF-to-image conversion is one of the most thorough metadata removal processes available. By rasterizing the PDF, you eliminate all non-visual data — hidden text, JavaScript, embedded files, interactive elements, and metadata. This is useful when you need to share a document's visual content while ensuring no hidden data travels with it.

PDF to Word (DOCX)

What gets stripped:

  • Digital signatures — Not preserved in the conversion.
  • JavaScript — Not carried over to Word format.

What survives or is recreated:

  • Text content — Extracted and placed into Word paragraphs.
  • Author metadata — May be carried over from the PDF into the Word document properties.
  • Images — Extracted and embedded in the Word document.
  • Formatting — Approximate recreation of the PDF layout in Word's formatting model.

Privacy impact: PDF-to-Word conversion can actually increase metadata exposure because Word files support richer metadata than PDFs (tracking, comments infrastructure, etc.) and the conversion tool may add its own metadata.

Video Conversions: Container Metadata Behavior

Video metadata behavior during conversion depends on whether the transcoder copies container metadata or starts fresh.

MP4 to WebM (or other container changes)

What typically gets stripped:

  • GPS location data — Most transcoders don't copy GPS metadata from the source container to the output. FFmpeg, the most common engine, does not copy GPS data by default when transcoding.
  • Custom tags and user data — Container-specific tags often don't have equivalents across formats.
  • Creation device information — Camera make/model metadata typically doesn't survive cross-container conversion.

What typically survives:

  • Creation timestamp — Many transcoders copy or set the creation time in the output container.
  • Encoder information — The output will identify the encoding software (e.g., "Lavf" for FFmpeg).
  • Technical stream data — Codec, resolution, frame rate, sample rate are inherent to the encoded streams.

Privacy impact: Video format conversion through FFmpeg-based tools (including browser-based converters that use FFmpeg WASM) generally strips the most sensitive metadata — GPS location and device identification. However, for maximum certainty, use the -map_metadata -1 flag (or verify that the tool applies it) to explicitly exclude all metadata from the output.

Trimming and cropping video

Trimming a video (cutting to a time range) or cropping (changing the frame dimensions) through re-encoding has the same metadata implications as a full format conversion: the output is a new file produced by the encoder, and metadata carryover depends on the tool's configuration. If the tool uses stream copying (copying encoded data without re-encoding), container metadata is more likely to survive.

Practical Metadata Privacy Guide

Based on the analysis above, here's a practical reference for metadata-conscious file handling:

Conversions that effectively sanitize metadata

  • Any image format to any other via Canvas API — JPEG to PNG, PNG to WebP, HEIC to JPEG, etc. through a browser-based tool that uses Canvas rendering. All EXIF, IPTC, and XMP data is stripped.
  • Word/Excel/PowerPoint to PDF — Tracked changes, hidden content, macros, and most internal metadata are removed. Author name and dates may persist and should be stripped separately.
  • PDF to rasterized image — All non-visual content is eliminated, including hidden text, scripts, embedded files, and interactive elements.
  • Video re-encoding (any format change through FFmpeg) — GPS data and device metadata are typically not copied to the output.

Conversions that may preserve metadata

  • Image format conversion through metadata-aware tools — Desktop editors and command-line tools that explicitly copy EXIF data to the output.
  • PDF to Word — Text, images, and some metadata are extracted and may be augmented with additional Word metadata.
  • Video stream copying (without re-encoding) — Container metadata is more likely to survive when streams are copied rather than re-encoded.
  • Same-format reprocessing — Some tools that open and re-save in the same format (JPEG to JPEG, for example) may preserve metadata unless explicitly configured to strip it.

The safest approach: Convert and verify

The most reliable metadata removal strategy combines conversion with verification:

  1. Convert through a Canvas-based or re-encoding pipeline. Browser-based tools that process images through the Canvas API or video through FFmpeg WASM provide natural metadata stripping as part of the conversion process.

  2. Verify the output. After conversion, check the output file's properties. For images, use an EXIF viewer. For PDFs, check Document Properties. For video, check the file's metadata through your operating system or a metadata inspector.

  3. Process locally. Using a browser-based converter like Fileza ensures that your files — including whatever sensitive metadata they contain — never leave your device. Uploading a file full of GPS coordinates and personal metadata to a server-based converter for the purpose of stripping that metadata defeats the privacy purpose entirely.

The Bottom Line

Metadata is the shadow data that follows your files. It records where you've been, what device you used, what edits you made, and sometimes reveals content you thought you'd deleted. Every file format has its own metadata characteristics, and every conversion has its own metadata outcomes.

Understanding what survives conversion and what gets stripped gives you control over the data you share. The cleanest approach — processing through a pipeline that reconstructs the output from content rather than copying the container — is both the most effective for metadata removal and the simplest to implement. A browser-based converter that processes through Canvas or FFmpeg WASM provides this naturally, with the added benefit that your metadata-rich source files never leave your device.

The next time you convert a file before sharing it, take an extra 30 seconds to check the output's properties. What you find — or don't find — will tell you exactly how much invisible data is traveling with your file.