Indexing Excel Spreadsheets using
htDig
the htDig intranet
search engine does not provide a filter for Excel spreadsheets. But using the xlHtml
Excel to Html converter allows htDig to index and thus search through Microsoft
Excel files.
HtDig allows you to define an external perser file for
indexing any document.
Usually this is a file called parse_doc.pl, available from the the htDig
website.
If you want to preserve your original parser file, follow this instruction,
otherwise proceed below. For some troubleshooting
hints please see below.
Standard Instruction
- download xlHtml from www.xlHtml.org and install it
(in this example it has been installed to /usr/local, otherwise the path needs to be updated in the file in step
3)
-
make sure the following two lines are included in mime.types
application/msexcel xls
application/vnd.ms-excel xls
- copy the the new parse_doc.pl
to /usr/doc/packages/htdig/contrib
(or whereever your parse_doc.pl is currently installed)
(in this file the path to xlHtml from step 1 might need to be updated)
- edit your htdig configuration file (usually /opt/www/htdig/conf/htdig.conf)
and add the following to the external_parsers section
application/msexcel /usr/doc/packages/htdig/contrib/parse_doc.pl
\
application/vnd.ms-excel /usr/doc/packages/htdig/contrib/parse_doc.pl
(remember to check the path to parse_doc.pl from step 3)
- start rundig and then search with htDig for a word included in the excel file
Troubleshooting
- make sure htDig is configured correctly and you can find ascii files
- run rundig with the -v option for more details about
the indexing process
- If everything else fails, mail me at sh@haberer-online.de