Simplify your intranet by using PDFs – but DON’T make documents searchable

November 29, 2008

pdf1Reading the Document Imaging Blog about how Google are now indexing PDFs and using OCR to make them searchable made me think about PDFs in general and whether searchable PDFs and other document formats are a good thing for intranets or not. At the moment I guess that search in most intranets doesn’t index PDFs but there are search products out there that will allow you to search all documents in your intranet.

As you can probably guess from the title I come down on the side of not making PDFs and documents searchable within an intranet for the reasons I give below.

I am not against using PDFs, in fact I find them extremely useful. Bundling up content in discrete sets as PDF documents in intranets makes absolute sense in many cases.

The advantages are –

  • Simpler navigation by reducing the number of web pages required
  • In content management systems it allows some content to be free of the straitjacket of CMS uniformity. Content can be arranged differently and graphics used in a way that makes sense within the document itself and takes into account the type of content being displayed
  • If the document is large, good internal navigation will allow users to navigate as if they were still using HTML web pages and the back button will always return them to the landing page
  •  Links to other parts of the intranet and the internet can still be preserved within the PDF
  • PDFs look nice and crisp when viewed on screen
  • They can be protected using different levels of security
  • PDFs are searchable through the PDF reader 
  • They are a lot less work. If the content owner needs to change something in the document they simply change the Word version they hold and send it to the intranet team who check the changes, turn the Word document into a PDF and then remove the previous version and upload the latest version. This usually only take minutes whereas large changes to content on web pages can mean hours of work

However you must make sure that each document has its own landing page. This is to ensure that the context for each document can be made explicit and to allow for metadata to be attached that is relevant only to the document.

OK I’ve hopefully sold you on PDFs but what about not making them searchable? I have been told that searchable PDFs will be a very good thing for intranets but I just don’t get it. The poor users put their search terms in and, as all documents are searchable, they will get a mountain of results back. Then, when they click on a result, it will land them on a document containing the search term. This can be a problem as a lot of documents aren’t set up like the best web pages and if they are PDFs from external sources, e.g. legislation, H & S advice etc., you won’t be able to change them anyway. The problem is context.

In good web pages you should be able to land on any page and have an idea where you are and what the page is about. This is not true for a lot of documents. Users also need to know the status of a document e.g. is it mandatory or for guidance only? So is the answer to put more and more information in the document so the user knows where they are when they land on a document or a document page? I think there is a simpler answer than that. Make documents non-searchable and non-accessible except through their landing pages. If the user can only access the landing page for a document and, if the intranet team has done their work correctly, the page will be findable and, through the information on the page, the user will have the context they need. This way, when they open a document, they will know what they are letting themselves in for.

Context is the key and, as a by-product of not making your documents searchable, you should be able to keep down the number of search results and, if the intranet team have done their stuff on the metadata for document landing pages, the quality of search results should also improve as only web pages will be indexed.  

You can learn more about metadata from James Robertson’s article in the Step Two blog.

(Many thanks to EJeffson for his CC Flickr PDF icon)

6 Responses to “Simplify your intranet by using PDFs – but DON’T make documents searchable”

  1. aLee Says:

    this could’nt come at a better time..
    thanks for posting this dude

    cheers

  2. patrick c walsh Says:

    aLee
    Many thanks for the feedback

  3. Jeff Witt Says:

    Wow. All the vision-impaired people out there (who read image-only PDFs as “blank”) should thank you. Maybe you should give a talk about this at the next CHI conference.

    • patrick c walsh Says:

      Jeff,
      Many thanks for your comment. I’d be grateful if you could you expand on your point.
      I had actually asked an accessibility expert prior to posting who confirmed that screen readers such as Jaws, with some adjustment, should have little problem with PDFs and that the ability to zoom in could be of help to the partially sighted. Providing Jaws users with the context before they open a document might save them a lot of time as well.

  4. Don Says:

    “PDFs look nice and crisp when viewed on screen” ???

    I think you must have your formats a bit confused. Compared to pretty much ANY other format, pdf’s look more blurred. Open one up side by side with any other format, html, doc, whatever, and compare them. You will see that pdf always looks inferior in comparison.

    • patrick c walsh Says:

      Don,
      I have been using PDF software since it was invented so I am fairly certain that I have my formats correct!
      Anyone else have a view?

      Patrick


Leave a reply to patrick c walsh Cancel reply