David Fenton Associates

Class Module clFileSearch:
A Replacement for the Office FileSearch Object

--------------------------------------------

Skip the verbiage and download the sample database now. All you need for it to work is the class module -- it is completely independent of everything else in the database. The form and the other code modules are there simply to provide an example implementation. See the comments at the top of the class module for information on dependencies (or see Dependencies, below).

Contents

Introduction

In May 2009, I picked up a new client who stored important data in Word documents that were linked to their Access database. They needed to be able to search the full text of the files and then pull up the corresponding records in their database. I first thought that the Office FileSearch Object would be perfect, and it was! But then I discovered that the FileSearch Object has been removed from Office 2007, and since I didn't want to implement something in their Access 2003 database that would prevent them from upgrading to later versions of Access, I began seeking other solutions.

After looking at Windows Desktop Search and Google Desktop Search and a number of external file searching utilities, I realized that there was nothing out there that replicated the functionality of the Office FileSearch Object. The desktop search solutions depended on indexing software running on workstations, or in the case of WDS, required a Windows server (this client lacked one). And all the numerous and very fine standalone external utilities had all the problems that come with any non-automatable out-of-process solution.

I felt it was important to be able to search both full text and Office document properties. The full-text search was fairly easy to implement using standard VBA file I/O functions. The properties search I was able to implement using an unsupported Microsoft DLL, the DSO OLE Document Properties Reader (dsofile.dll).

So, this page is a description of the results. If you want to read the details of how I arrived at the decision to write a class module to replace the Office FileSearch Object, you can read the thread that resulted from my post on the subject in the microsoft.public.access newsgroup. Some further discussion ensued when I created a new thread to announce the class module, but this article incorporates all the significant points in those threads and adds new material relating to features added since those postings.

Properties and Methods of clFileSearch, Class Module to Replace Office FileSearch Object

Here is a list of the properties and methods in clFileSearch. I list here all the properties/methods of the Office FileSearch Object even if I didn't implement them in the class. I also added some new properties. The properties are read-only unless specified (w=write, r/w=read/write)

PROPERTIES
ApplicationNot implemented (irrelevant)
Creator (r/w)implemented
ExecuteCancel (r/w)added by DWF
FileName (r/w)implemented
FilesToCheckadded by DWF
FileType (r/w)implemented
FileTypesimplemented
FileTypeSpecify (w)added by DWF
FoundFilesimplemented
FSO (File System Object)added by DWF
LastModified (r/w)implemented
LastModifiedSpecify (w)added by DWF
LastModifiedSpecifyEndadded by DWF
LastModifiedSpecifyStartadded by DWF
LookIn (r/w)implemented
MatchAllWordFormsNot implemented
MatchTextExactly (r/w)implemented
MatchTextType (r/w)added by DWF
ProgressBarForm (w)added by DWF
PropertyTestsNot implemented
SearchFoldersNot implemented
SearchPropertiesOnly (r/w)added by DWF
SearchScopesNot implemented
SearchSubFolders (r/w)implemented
TextOnly (r/w)added by DWF
TextOrProperty (r/w)implemented
METHODS
Executeimplemented -- extended by DWF
NewSearchimplemented
RefreshScopesNot implemented

Download the sample database now.

Comments About the Implementation

I could not implement MatchAllWordForms, unfortunately, as that depends on functionality that is not exposed anywhere that I know of.

I didn't feel that the SearchFolders and SearchScopes collections were worth implementing, as they are aspects of the FileSearch object as it exists that make it incredibly complicated to work with. I also didn't implement the PropertyTests collection as I couldn't see much utility in it. Because I didn't implement the SearchScopes collection, there was no need to implement the RefreshScopes method.

By the way, if you're wondering about a good reference for the Office FileSearch object, all you need to do is open the VBAOF11.CHM (or 10; you won't find anything on it in VBAOF12.CHM, because there is no such file any longer) help file and search for "filesearch." It's pretty complete, and it's what I used to work things out. I don't know why the VBE can't pull up that help file when you're coding for the FileSearch Object, though.

Dependencies

To work as designed, you also need to download the DSO OLE Document Properties Reader:

http://support.microsoft.com/kb/224351 (version 2.1)

...and register the DSOFile.dll. The class module should be able to be used for full-text searches without needing to install and register this DLL (I used late binding so that if it's not registered, it doesn't cause fatal errors for non-property-based searches). Also, the TextOnly property was implemented specifically for the purpose of insuring that a text-only search would bypass any reference to the DSO library.

See Searching File Properties below for details.

Comments on Three Enumerations

I implemented two of the Office enumerations, for MsoLastModified and for MsoFileType. Both raise significant questions as to what they mean and how they should be used, but I implemented them for consistency with the FileSearch Object. In both cases, I don't know how to interpret the meaning of the constants, which have these names:

msoLastModified Enumeration
msoLastModifiedAnyTime
msoLastModifiedLastMonth
msoLastModifiedLastWeek
msoLastModifiedThisMonth
msoLastModifiedThisWeek
msoLastModifiedToday
msoLastModifiedYesterday

(I also added msoLastModifiedSpecify since it was crucial to implement additional functionality that I wanted to add)

AnyTime, Today and Yesterday are quite clear, of course. But it's not clear to me hnow LastMonth, LastWeek, ThisMonth and ThisWeek should be interpreted. I interpreted them respectively as:

msoLastModified Interpretation
msoLastModifiedLastMonthBetween DateAdd("m", -2, Test Date) And DateAdd("m", -1, Test Date)
msoLastModifiedLastWeekBetween DateAdd("ww", -2, Test Date) And DateAdd("ww", -1, Test Date)
msoLastModifiedThisMonthBetween DateAdd("m", -1, Test Date) And Date()
msoLastModifiedThisWeekBetween DateAdd("ww", -1, Test Date) And Date()

Another interpretation for the months would be that when the current month is May, the previous month would be April, and likewise with week (current week beginning on the most recently passed Monday, and last week being the week before that).

The logic that uses this enumeration is in the class module's LastModified Property Let and anyone could easily alter the definitions there to suit their needs. MS hasn't implemented it consistently in any of the places I looked, which included the WinXP search companion, Windows Desktop Search and the Office FileSearch UI -- each one has different choices and they aren't consistent with the choices defined for the Office FileSearch object. I didn't test what the results were, because I didn't have files with appropriate dates to check on. I figured it's the kind of thing you'll either not use at all (in which case it doesn't matter how I implement it), or it's something you'll use but probably want to adjust the interpretation for your own users. In other words, no matter what I implemented, it's probably not going to get used in the exact form I did it if it's used at all.

Anyway, in the sample database, I implemented a combo box of the enumeration types, while also allowing the user to type in a specific date. If you enter anything into the combo box other than one of the predefined choices, the combo box's AfterUpdate event assumes you intended to type a specific date, and checks to see if it's valid. If not, it prompts you to enter a valid date. Once the valid date is entered, the TO textbox is enabled so you can enter a range.

The other enumeration I implemented is MsoFileType, which is an enumeration of all the MS Office file types. It is documented here:

http://msdn.microsoft.com/es-es/library/microsoft.office.core.msofiletype(VS.80).aspx

(I don't know why I ended up with the Spanish version)

I used that as a guide, but didn't follow it exactly. I also added the OpenXML document extensions (e.g., xlsx, docx, etc.). For anyone who uses it, note that the .FileType property handles only the file types defined in the enumeration, which are:

msoFileType Enumeration
msoFileTypeAllFiles
msoFileTypeBinders
msoFileTypeCalendarItem
msoFileTypeContactItem
msoFileTypeDatabases
msoFileTypeDataConnectionFiles
msoFileTypeDesignerFiles
msoFileTypeDocumentImagingFiles
msoFileTypeExcelWorkbooks
msoFileTypeJournalItem
msoFileTypeMailItem
msoFileTypeNoteItem
msoFileTypeOfficeFiles
msoFileTypeOutlookItems
msoFileTypePhotoDrawFiles
msoFileTypePowerPointPresentations
msoFileTypeProjectFiles
msoFileTypePublisherFiles
msoFileTypeTaskItem
msoFileTypeTemplates
msoFileTypeVisioFiles
msoFileTypeWebPages
msoFileTypeWordDocuments

I added a custom type, msoFileTypeUserSpecified, so that the class module had a mechanism for handling other file types. The .FileTypeSpecify property takes a single extension or a comma-delimited list of extensions. The extension can be passed with or without the wildcard character (everything before the . is ignored). So you could assign "doc" or "*.doc" to the property and it would have the same result, and likewise, "doc, csv, txt" or "*.doc, *.csv, *.txt".

As I said, I interpreted the meaning of these constants more broadly than the documentation said. For instance, there was no Excel Spreadsheet choice that defined xls, xlt, etc. as the valid file types, but only the ExcelWorkbooks constant. So I defined that as being these extensions: xls, xlt, wbk, xlsx, xlsm, xltx, xltm, xlsb, xlam (I don't know if there is an OpenXML wbk format -- I can't find wbkx in Google, except as radio station call letters!).

Note also that for now, I've commented out all the extensions for template files except for the msoFileTypeTemplates file type.

If in an application you need to provide a selection of file types beyond the Office file types, it would be better to utilize only the .FileTypeSpecify property since you could, for instance, provide a multiselect listbox that would allow the selection of multiple file types and then simply pass a comma-delimited list to the property. I implemented the enumeration of Office documents in order to be consistent with the Office FileSearch object but this somewhat restricts the use of the .FileType property of the class module. However, it does insure that the interface is completely consistent in that regard with the FileSearch object -- as a replacement for it, it will work fine.

But in general, I suspect that it will be more convenient to use the .FileTypeSpecify property and use a table of file types to populate a list box (or combo box) for the user to choose from.

A new enumeration that I implemented is for use with the new MatchTextType property:

Public Enum osfMatchTextType
  osfMatchTextTypeMatchAll = 1
  osfMatchTextTypeMatchExact = 2
  osfMatchTextTypeMatchAny = 3
End Enum

This overlaps with the MatchTextExactly property that is there for consistency with the Office FileSearch Object. But this implementation is more flexible.

The example form features an example of how to implement it, and the searches all work with it.

Searching File Properties

In order to provide the ability to search OLE document properties (as provided in the Office FileSearch object), I utilized an unsupported Microsoft ActiveX DLL, the DSO OLE Document Properties Reader:

http://support.microsoft.com/kb/224351 (version 2.1)

As I said above, I wrote the class module with late binding so that if it's not registered, it shouldn't be fatal. Without it, the property searching doesn't work.

The implementation of this was very complicated, as I had to convert VB6 code to figure out how it works, and it turned out that the collection that the object returns (if you browse it with the object viewer, it's the DSOFile.OleDocumentProperties.SummaryProperties collection) is not loopable (as opposed to the DSOFile.OleDocumentProperties.CustomProperties collection, which worked just fine). I had to hard code each property retrieval one at a time and that resulted in some very ugly code. Here's the code for checking date properties:

  Private Function SearchPropertiesDate(dteSearch As Date, _
     Optional strPropertyName As String) As Boolean
    Dim bolSingleProperty As Boolean
    Dim bolTemp As Boolean
    Dim dteLastSaved As Date

    bolSingleProperty = (Len(strProperty) > 0)
    If bolSingleProperty Then
      Select Case strProperty
        Case "DateLastSaved"
          GoTo DateLastSaved
        Case "DateCreated"
          GoTo DateCreated
        Case "DateLastPrinted"
          GoTo DateLastPrinted
      End Select
    End If
    If dteSearch = 0 Then GoTo exitRoutine
  DateLastSaved:
    bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastSaved,dteSearch)
    If bolTemp Or bolSingleProperty Then GoTo exitRoutine
  DateCreated:
    bolTemp=InStr(DSOFileDoc.SummaryProperties.DateCreated, dteSearch)
    If bolTemp Or bolSingleProperty Then GoTo exitRoutine
  DateLastPrinted:
    bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastPrinted,
    dteSearch) If bolTemp Or bolSingleProperty Then GoTo exitRoutine

  exitRoutine:
    SearchPropertiesDate = bolTemp
  End Function

Let me explain that code. First, when searching all the properties, I wanted it to short circuit -- that is, as soon as the first property matched the criteria, I wanted to return True and exit. This is why it has all those tests and "GoTo exitRoutine" jumps -- it's simply that there's no utility in knowing that two properties matched the same search string.

Second, I also wanted to be able to provide the capability of searching for specific properties. The only way I could think to do that was with GoTos, which is why it's so incredibly ugly!

But it's all caused by the lack of looping through the collection. The code sample that came with the download of the DLL also didn't walk through the SummaryProperties collection, so I doubt that it's possible. If someone can figure it out, I'd gladly rework that code to make it less stupid! For now, it does work, though.

Because some might want to use the class module without the DSO File Properties DLL, I also implemented a new Boolean property, TextOnly, that causes the property search to be skipped if it's set to True. It still searches the properties, of course, since the properties are part of the text (this is why I didn't implement it before, because there could not be any difference between TextOnly and TextOrProperties), but in order to allow the class module to be used without installing the DSOFile DLL, I think it's a good idea. You just have to be sure you understand that there is actually never going to be any difference between the the results returned with either setting.

Search Results May Not Be Identical to FileSearch Object

I've tried to be as flexible and commonsensical as possible in my implementation of the searching, but there's a major interaction between the logic of the class module's searching functionality and UI. Let me explain.

The main guts of the search process is in a subroutine called SearchFileForText. In it, first the files are searched for text, then for last update (through the Access FileDateTime() function), and then for the OLE properties. Whether or not SearchFileForText returns true depends on two things:

  1. the results of the three searches.
  2. the appropriate combination of those three results.

Now, because there was no way for me to have a text search ignore properties embedded in the header of the document, I originally made a decision to allow a choice only between searching Properties Only or Both Properties and Text. This is the way the FileSearch object actually implements it so far as I can tell (the UI in Office has never given you the choice of searching text only, just both or properties only), so I would tend to think it's not doing anything special about reading only certain parts of the OLE stream.

But in order to decouple the class module from the DSO dll, I also added the TextOnly property, which skips the property search. As explained above in the section on searching file properties, there isn't actually any difference in the results between searching the text only and searching both properties and text, because the file's full text stream is searched in either case, so properties embedded in the header are going to be matched either way. But that option allows the code to bypass the dependency on the DSO dll, which having only a choice between searching properties alone (SearchPropertiesOnly) and searching both properties and text (TextOrProperty) would not have permitted.

Clarifications on How to Use the Sample Database

After I posted about my work in microsoft.public.access, I got this response from one of the few people brave enough to try it out:

If you accidentally type a non existing folder, it would be nice to get a "Folder doesn't exist" warning.

While it's certainly true that the version of the sample database he was working with had that flaw, I want to stress that the problem he encoutered was a UI issue specific to the sample form, and not an issue about the class module itself. The form in the demo database is not intended as a recommended implementation of the UI for the class module, only an example that demonstrates what the class module can do. Certainly in any real implementation, you'd want to trap for that, but that happens at the form level, not in the class module's code.

I did end up addressing this in the demo form after that comment, by first making the FSO property of the class module public, and then coding the textbox's BeforeUpdate event thus:

  Private Sub txtSearchFolder_BeforeUpdate(Cancel As Integer)
    If Not clsFileSearch.FSO.FolderExists(Me!txtSearchFolder) Then
       MsgBox "That folder does not exist.", vbExclamation, _
          "Folder doesn't exist!"
       Cancel = True
       Me!txtSearchFolder.Undo
    End If
  End Sub

That's code that went in the sample form implementation, not in the class module itself.

Some Late UI-related Additions

There are three late additions to the class module properties that are not discussed above, added for the purpose of better hooking the UI into the functionality of the class module. Here is a brief description of each:

--------------------------------------------

--------------------------------------------