[upon previewing, i know the fonts are all screwy here...just copy and paste or squint
. I'll work on reformatting this after I get some sleep. I've written this for WSS 3.0, but if you're using MOSS 2007, the steps are virtually identical. You might just need to do a little control-F in regedit to get to the appropriate key location (i think its under OfficeServer instead of Shared Tools - something like that)]
To Install support for PDF / MDI / TIF / XPS indexing in Sharepoint 3.0
Sorry for the delay - yeah MDI indexing was crucial for me as well as the filesizes are much more compressed. The search filter settings are located in 3 different locations, but are all virtually identical in nature - they all relate to specific registry keys. First off, you need to make sure the MDI filter is installed in SQL server. You need to install Microsoft Document Imaging from the Office 2003 or Office 2007 installation - under Additional Tools if you do a custom installation. Then in SQL Server, run the following query to determine installed filters: select document_type, class_id, version from sys.fulltext_document_types. This will return a list of the extensions filtered along with the clsid of the components handling them. If you have an entry for .mdi files, then copy the clsid to a location you can reference later. If you do not see .mdi listed but you have MODI installed, you need to run the following sql query to enable the iFilter (something about Microsoft not signing the component so sql blocks it by default):
Code Snippet
exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;
go
After this, re-run the first query to verify .mdi files are now associated with the correct component (should be clsid: 62160CBE-AFCB-4795-9B68-DDE5BA6D2524). With this done, time to modify the registry. With WSS 3.0 installed, you want to navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0 - and do an export for backup purposes of everything under this key. Navigate to the sub key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\89a9d417-5231-45b7-a9f5-08d4e043d8ce\Gather\Search\Extensions\ExtensionList - you're CLSID may differ from mine, but the structure should be similar. In the extension list you will see a list of keys listed by number with extension types as the data. Add the extensions you're interested in indexing (for me I added keys 38,39,40 with data of "pdf","mdi","xps" respectively). Next navigate to the following key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Default - Here is a key called "DefaultExtList" that should currently tell what extensions are handled by sharepoints default iFilter. I made sure to remove "Tif" and "Tiff" from this list as they need to be handled through the MODI filter. Next navigate to the key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension - Here you find how sharepoint maps extensions to the components to handle them. Insert keys following the existing layout by adding your extensions along with the CSLID for your MODI component (should be 62160CBE-AFCB-4795-9B68-DDE5BA6D2524). On a side note, if you pulled out the SQL iFilters list in the first step, feel free to setup PDF indexing along the way - just poing the extension PDF at the CSLID found in your SQL Server. Also, take care to point TIF and TIFF files to the MODI component as well. Last step here, I went to the following key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\Filters - and followed the existing structure to tie my TIF and PDF extensions to the appropriate MIME-TYPEs ('application/pdf' and 'image/tiff' respectively).
With this done, do another backup of this entire part of the registry in case things need to be redone. While I originally thought these changes would do the trick, after a server reboot I found out that all the registry settings I modified were reverted back to the original settings automatically (I was pretty pissed). If this happens, you can simply execute the backed up .reg file to merge the changes you made right back in. After my registry was put back to normal I figured there must be saved copies of this part of the registry somewhere (sharepoint must do some comparison against factory default settings and modify if it notices changes/corruption). The copy of the registry is actually held in your SQL Server database for sharepoint as a binary field in a table. Specifically, in your Search Database under the table MSSconfiguration is a record with the field Name='RegistryBlob' and BigValue= the binary data containing the original registry information. I wrote a little .net application that retrieved the binary bytes from this record and wrote them to the filesystem. Open up the file in a text editor and you will see it is just another .reg file. You need to make the same updates you made to the actual registry in this file, so that upon comparison Sharepoint will not recognize any differences. This is were cutting and pasting from the backup .reg files I mentioned previously will make this very easy. Upon finishing the file updates, modify your application to then read the bytes from the file and update the existing record with the new binary data. Lastly, upon searching around the hard drive I found another file located around the Program Files\Common Files\Microsoft Shared\etc... called 'registryblob.reg' that was another 3rd copy of the same registry information. I simply copied over this file (after making a backup) with the same file I was uploading in binary back to the MSSconfiguration table. With this done, you should be good to go. Again, we're making the same registry modifications but in 3 different locations. Once in the actual registry, once in a .reg file around the hard drive, and once in a binary file held in the Search Database table MSSconfiguration. Reboot the server, upload a .mdi/.tif/.pdf file(s), wait 10 mins or so and do a search against content in the file. Should be jackpot. The following is the code I used to write the binary into/out of the table (pretty basic stuff):
Code Snippet
'Read binary to local file
Dim sql As String = "SELECT BigValue FROM MSSConfiguration WHERE Name = 'RegistryBlob'"
Dim sqlcom As New SqlClient.SqlCommand(sql, SqlConnection4)
sqlcom.Connection.Open()
Dim bytes() As Byte = sqlcom.ExecuteScalar
sqlcom.Connection.Close()
Dim fs As New System.IO.FileStream("c:\db_original.reg", IO.FileMode.Create)
Dim bw As New System.IO.BinaryWriter(fs)
bw.Write(bytes)
bw.Close()
fs.Close()
MsgBox("done")
'Write binary back to database
Dim sql As String = "UPDATE MSSConfiguration set BigValue = @data WHERE Name = 'RegistryBlob'"
Dim sqlcom As New SqlClient.SqlCommand(sql, SqlConnection4)
sqlcom.Parameters.Add(New SqlClient.SqlParameter("@data", SqlDbType.Binary))
Dim fs As New System.IO.FileStream("c:\db_modified.reg", IO.FileMode.Open)
Dim br As New System.IO.BinaryReader(fs)
Dim data() As Byte = br.ReadBytes(fs.Length)
br.Close()
fs.Close()
sqlcom.Parameters("@data").Value = data
Try
sqlcom.Connection.Open()
sqlcom.ExecuteNonQuery()
Catch ex As Exception
MsgBox(ex.Message)
Finally
sqlcom.Connection.Close()
End Try
MsgBox("done")