What is PDF Metadata? Why is it useful, and how can it help me?
If you've ever wondered what PDF metadata is, you've come to the right place. This blog will outline what PDF metadata is, why it is useful, what it has to do with accessibility and how it can help your organisation.
TL;DR (Too long; didn't read)
- PDFs are inaccessible.
- If you have to use PDFs, ensure your PDF has metadata.
- PDF metadata helps to make your PDF accessible, helping assistive technology understand the content of the PDF.
- Insytful can help to identify any PDFs on your website that are not accessible.
Quicklinks
What is PDF metadata?
How can PDF metadata help my organisation?
How can I make my PDFs accessible?
How can I check if my PDFs have metadata?
Additional reading
What is PDF Metadata?
First of all, what is metadata?
Metadata: Metadata refers to information about content, a document or a webpage. Metadata is hidden information in the code that gives you details and properties about the content you are viewing, for example date created, title and author.
Metadata can be found in many places, such as on documents, images, web pages, spreadsheets and PDFs.
PDF metadata provides additional information about a PDF document. Metadata provides context about a PDF document. Some examples of PDF metadata include:
- The title of the PDF document
- The author of the PDF
- The creation date of the PDF.
Why is PDF metadata useful?
PDF metadata has many useful benefits, including:
- Accessibility and inclusivity
- Document audit trails
- Quick search and find
- Compliance with accessibility legislation
- Ease of navigation
Accessibility and inclusivity
PDF metadata aids users with disabilities in understanding important information about a document. For example, providing the author, title, language and subject all helps screen readers and assistive technology (AT) better interpret the content. By providing AT with more information, the content announced to users is more useful and easier to understand.
Other benefits PDF metadata provides for accessibility are:
Ease of navigation
Metadata can be used to create headings and logical reading orders within a document. Correct heading orders help users with assistive technology to navigate through lenghty or complex content.
Compliance with accessibility legislation
Accessibility guidelines require metadata fields to be filled out correctly to ensure users with disabilities can access key information.
Documents with correct metadata are more discoverable and improve usability and accessibility for all users.
Document audit trails
PDF metadata provides an audit trail about the document, removing guesswork about who created it and when it was created.
Quick search and find
Metadata is searchable on PDF documents and is accessible through search tools. A standard office computer contains hundreds of documents that users access from time to time.
How can PDF metadata help my organisation?
Adding metadata to your PDFs helps your organisation to be inclusive and legally compliant.
PDFs are known to be an accessibility nightmare. Therefore, if your organisation cannot move away from PDFs online to HTML, the next best thing is an accessible PDF. By ensuring your documents have accurate PDF metadata, you are taking action to make them accessible and inclusive.
By providing accessible PDFs, your organisation meets legal standards and makes documents accessible to all.
Why are PDFs bad for accessibility?
Most PDFs have not been designed to be accessible. Without the correct structure, metadata and tagging, screen readers cannot easily access PDFs, which makes them bad for accessibility. PDFs with missing metadata and tags are a big issue, but scanned PDF documents are one step worse.
Scanned documents are recognised as images and not text, which makes it impossible for screen readers to understand.
Following PDF best practice
The Government Digital Service (GDS) insists that content for the UK government should be published in HTML instead of PDF. HTML content is far easier to find and use than PDF content, and it is also much easier to maintain.
How can I make my PDFs accessible?
There are two routes to make your PDFs accessible. Creating accessible PDFs and optimising existing PDFs.
Creating accessible PDFs
Route one is to make your PDFs accessible at the point of creation, such as by adding alt text to images and maintaining a proper heading structure throughout the document.
Optimising existing PDFs for accessibility
Route two is to optimise PDFs after they have been created. An example of this would be during a content audit, finding legacy PDFs on your website that are not accessible. This might be because they are scanned documents and have missing metadata, for example. To learn how to make your PDFs accessible for everyone, visit our blog.
What to do if you need to publish a PDF?
The GDS recommends if you cannot avoid creating a PDF, that you should:
- Have the equivalent content published in HTML.
- Ensure the PDF also meets archiving standards.
- Create the PDF in line with accessibility standards, with PDF metadata added.
What metadata can be found on a PDF?
Standard PDF metadata includes the title, keywords, language, author, file size, copyright details, creation and last modification date.
Why should I add metadata to a PDF?
Adding metadata makes searching for documents easier and improves usability and accessibility. In CMSs and tools like Insytful, metadata allows you to sort and find documents quickly based on their attributes.
How do I add metadata to PDFs?
When PDFs are created, metadata is automatically generated from the source document. By visiting the document properties, you can view and edit some of the automatically generated metadata.
However, you can also add additional information to the PDF to improve the quality of the metadata associated with the PDF.
For example, keywords can be added to the PDF to describe what the document is about and essentially tag the document. So if you don't know the name of the document, but know you want to find all legal documents from 2024, if you have added "2024", or "legal document" as keywords, it will pop up in searches.
How can I check if my PDFs have metadata?
Tools like Insytful can automatically check all PDFs on your website and notify you if they are accessible by looking for metadata. Insytful checks for the title, keywords, language, creator and producer, author, file size, date created, last modified and format version.
What metadata can I see in Insytful?
Title
The title of the document, for example "Website Analytics Results 2024," is arguably the most critical piece of data a document can have; it describes what the document is and gives context, especially for screen reader users who can't skim documents quickly. The title is also shown on search results pages and helps search engines return the right document for the right search.
Keywords
Keywords related to your document. For example "analytics", "report," etc.
Keywords can help you browse, find documents, and audit your content quickly, especially if you don't know the title or the document you are looking for. If your keywords are all correct, you can create custom searches and reports to show all documents tagged with helpful keywords, such as "instruction manual" for all instruction manuals.
Language
The language attribute, for example, "en-gb," is one of the most critical pieces of data in terms of accessibility. It identifies the document's language so that screen readers and text-to-voice systems know how to read it. There is nothing more infuriating than a document written in Spanish being read out by a screen reader thinking it's English.
Creator and Producer
Many PDF documents are first created in a native application and later exported/saved as/converted into PDF; the Creator and Producer attributes describe this software. For example, the Creator could be "Microsoft Word," and the producer (the software that converted it from Word to PDF) could be "Adobe PDF Library."
Author
The author attribute, for example, "John Smith, j.smith@website.com," allows you to define who wrote or produced the document's contents. This could be surfaced in search result pages and other locations, giving authority to the document. It also allows you to easily search for all content produced by a specific person.
File size
Computer systems automatically calculate the size of the document. This allows you to see how big a file is and decide if it's appropriate for its intended purpose.
Date created
The date that the document was created.
Last modified
The date that the document was last edited.
Format version
The PDF format is an industry-wide standardised format that undergoes reviews and iterations, implementing new features over time. Each iteration receives its version number, for example "1.4." Knowing the PDF format it was created with gives you some insight into whether it's outdated and if it offers the latest accessibility features.
Summary
If your organisation needs to use PDFs on your website, remember to add PDF metadata to make them inclusive and accessible.
Additional reading
For more information about creating accessible PDFs, read our step-by-step guide, how to make PDFs accessible for everyone.
Tools to help check your PDF metadata
Checking all your PDFs for metadata can be a manual and tedious job. We recommend using an automated tool to find all the inaccessible documents on your website.
Tools available to check for PDF metadata:
- Adobe Acrobat
- Inystful
Get a free website health check with Insytful
Impress your website visitors with an accessible and usable website. With Insytful, you can find out the health of your website and check for accessibility errors.
Improve your web accessibility, and get the first 100 pages scanned for free.
