Skip to main content

PDF inventory

Page last updated 30 April 2026

The PDF inventory collates all PDFs found on your website during an Insytful scan.

It provides a single view of every PDF, its metadata and its accessibility status, so you can quickly identify PDFs that need attention.

This is particularly useful if you're auditing a large site for accessibility compliance, cleaning up legacy documents, or preparing for regulations like WCAG 2.2 or the European Accessibility Act.

What information does Insytful scan for?

During a scan, Insytful collects the following for each PDF it finds:

Document URL and title

The web address where the PDF is hosted and the title set in the document's metadata. A missing or generic title ("Untitled" or a filename like report_final_v3.pdf) makes the document harder for users and search engines to identify.

Occurrences

The number of pages on your site that link to or embed this PDF. A high count means the document is widely referenced, so fixing any issues with it will have a bigger impact across your site.

Accessibility score

A rating that reflects how accessible the PDF is, based on factors like tagged structure, reading order, alternative text for images, and metadata completeness. A low score indicates barriers for users who rely on assistive technologies such as screen readers.

Author

The person or organisation recorded in the document's metadata. This helps with document ownership and accountability, especially when managing a large content library across multiple teams.

Machine readable

Whether the PDF contains actual text that software can read and interpret, or whether it's essentially a scanned image. If a PDF isn't machine readable, screen readers cannot parse its content, making it inaccessible to visually impaired users. These documents typically need OCR (Optical Character Recognition) processing to become accessible.

Created date and last modified date

When the document was first created and when it was last changed. Useful for spotting outdated content that may need reviewing, updating, or removing.

Language

The language is set in the document's metadata. This tells screen readers which language to use when reading the document aloud. Without it, a screen reader may default to the wrong language, making the content unintelligible.

Size (KB and MB)

The file size of the document. Large PDFs can cause slow downloads and poor user experience, particularly on mobile devices or slower connections.

Missing metadata

If a document has missing metadata, Insytful highlights it on the PDF inventory page so you can address it. Missing metadata is one of the most common and most easily fixed PDF accessibility issues. Insytful flags the following:

No document title

The PDF has no title set in its properties. Users see this title in browser tabs, search results, and screen reader announcements. Without it, they may only see a filename, which is often meaningless.

No author data

No author is recorded. While this is less critical for end-user accessibility, it makes document governance more difficult and may be a compliance requirement in some sectors.

No language data

No language tag is set. As noted above, this directly affects how screen readers pronounce the content. For multilingual sites, this is especially important.

No keywords detected

The document contains no keyword metadata. Keywords help with search engine discoverability and internal content categorisation.

For a deeper look at why this matters, see our guide PDF metadata: what it is, why it's useful, and how it can help.

What should I do about flagged PDFs?

Once you've identified PDFs with issues, here are the typical next steps:

For missing metadata

Open the PDF in a tool that supports metadata editing (such as Adobe Acrobat Pro, or free alternatives like ExifTool). Then fill in the title, author, language, and keywords fields and re-upload the corrected file to your site.

For PDFs that aren't machine-readable

These are usually scanned documents saved as images. Run them through an OCR tool to convert the image content into selectable, searchable text. Adobe Acrobat, ABBYY FineReader, and other open-source tools can handle this.

For low accessibility scores

Review the PDF's tag structure, reading order, and image alt text. Remediation can range from quick metadata fixes to more involved structural work or replacing it with HTML, depending on how the document was originally created.

For large file sizes

Consider compressing the PDF or, where appropriate, converting the content to an HTML page for a better web experience.

For outdated documents

If the created or modified dates suggest a document is stale, review whether it should be updated, replaced, or removed from the site entirely.

Still need help?

If you still need help, reach out to the Insytful community on Slack or raise a support ticket for assistance.

Still need help?

If you still need help after reading this article, don't hesitate to reach out to the Insytful community on Slack or raise a support ticket to get help from our team.
New support request