Search

DocFetcher

Dec 03, 2021 rap

Source: ~/Dropbox/ANRL-Web-Site/ANRL_PAGES/DocFetcher.md



DocFetcher is an Open Source desktop search application: It allows you to search the contents of files on your computer. — You can think of it as Google for your local files. The application runs on Windows, Linux and OS X, and is made available under the Eclipse Public License.

I found DocFetcher on Jul 20, 2021 after doing some internet searches for “text search multiple PDF documents on computer.” Installation was a little tricky but I had it running in less than an hour and Jim Sweeney and David Foote had it running after an hour as well. DocFetcher requires JAVA and that was the tricky part.

DocFetcher first builds an index of files in selected folders then offers searching with a very sophisticated search engine. Jim, Dave and I will be testing and updating the Web Page with results.

Preliminary Results

  1. Use of this application seems to be very promising for searching ANRL’s digital archives.
  2. Index of ~/Dropbox/ANRL-COMMUNICATIONS took just a few seconds. It has eMails, PDF meeting minutes and lots of miscellaneous items. I was able to get search results from test searches.
  3. I attached an external hard disc of a copy of the NAS digital archive and was able to index that.
  4. So far, I can not find the results of found text in the file system of my MAC. I’m still looking. I thought it would be located somewhere in the Application Support folder.
  5. Index of the United States Magazine archive (around 135 GB, 4,907 PDF files, 74 folders) took less than 4 hours. Some test searches produced results.
  6. Index of the Foreign Magazine archive (around 130 GB, 7,185 PDF files, 137 folders) took less than 4 hours. Some test searches produced results.
  7. I was able to shut down the computer then re-start and then still see the index files.
  8. DocFetcher Documentation is available inside the application. You can see some parts of the documentation on the DocFetcher Website. I copied the documentation to Evernote and this is a link to that page.   

Tips

  1. When searching within a parttcular area of indexed files, turn off (uncheck checkboxes by folder name) all other areas except the one contraining your area of interest. Ex: Magazines has several hundred titles (sub-folders) and if searching The Bulletin, turn off all sub-folders except The Bulletin.
  2. Don’t forget to do an Internet search in addition to searching ANRL’s archives. ANRL does not have all the world’s information on social nudity.
  3. After refining a search, you can select all file names and then output a CSV file. This can be kept or sent to the patron who asked for the search.
  4. Search results can be sorted by clicking on the header titles in the search results pane.
  5. Help files are available on the web. Just search on DuckDuckGo or Google.

DocFetcher Pro Server

See writeup on this page.