What file formats does dtSearch support?
Last Reviewed: January 22,
2009
Article: DTS0103
Applies
to: dtSearch 7.60 and later
Supported file
formats
Automatically-recognized
fields
Older file
formats
Image file
formats
dtSearch can automatically recognize, index,
search and display documents, including graphic marking of hits
and multiple hit and file navigation options, in the following
current formats. HTML and PDF documents appear with
all formatting and embedded images and links intact, exactly as
in the original document. dtSearch developer product can
display XML files with XSL formatting. dtSearch converts
other file types to HTML for display with highlighted hits.
dtSearch uses its own built-in file viewers for document
parsing and display, unless otherwise noted. All file
formats are supported through the current release versions,
unless otherwise noted.
While extensions are provided to identify some
file formats below, dtSearch generally does not rely on
extensions to detect file formats. For example, a
Word document named "sample.mp3" would still be identified as a
Word document.
Related Topics
International language support:
dtSearch supports all languages
through Unicode support. See "Unicode Support" and "International Language
Support".
SQL databases:
See "How to index
databases with the dtSearch Engine."
Dynamically-generated content generated by
ASP.NET, CMS, Sharepoint and similar products (*.jsp, *.asp,
*.aspx, *.php, etc.):
See "How
to use dtSearch Web with dynamically-generated web
sites".
GroupWise, Lotus Notes, and other message
archive formats:
See "Email conversion
tools".
To use IFilters to add support for unsupported
formats:
See "How to use dtSearch with
IFilters".
For scanned document data that
requires OCR:
See "How to use
dtSearch or dtSearch Web with OCR"
Supported file formats
Adobe Acrobat (*.pdf)
Ami Pro (*.sam)
Ansi Text (*.txt)
ASCII Text (See note
3)
ASF media files (metadata only)
(*.asf)
CSV (Comma-separated values) (*.csv)
DBF (*.dbf)
EBCDIC
EML files (emails saved by Outlook
Express) (*.eml)
Enhanced Metafile Format (*.emf)
Eudora MBX message files (*.mbx)
Flash (*.swf)
GZIP (*.gz)
HTML (*.htm, *.html)
JPEG (*.jpg)
Lotus 1-2-3 (*.123, *.wk?)
MBOX email archives (including
Thunderbird) (*.mbx)
MHT archives (HTML archives saved by
Internet Explorer) (*.mht)
MIME messages
MSG files (emails saved by Outlook)
(*.msg)
Microsoft Access MDB files (see
note 1) (*.mdb, *.accdb)
Microsoft Document Imaging (*.mdi)
Microsoft Excel (*.xls)
Microsoft Excel 2003 XML (*.xml)
Microsoft Excel 2007 (*.xlsx)
Microsoft Outlook/Exchange (See
note 2)
Microsoft Outlook Express 5 and 6 (*.dbx)
message stores
Microsoft PowerPoint
Microsoft PowerPoint 2007 (*.pptx)
Microsoft Rich Text Format (*.rtf)
Microsoft Searchable Tiff (*.tiff)
Microsoft Word for DOS (*.doc)
Microsoft Word for Windows (*.doc)
Microsoft Word 2003 XML (*.xml)
Microsoft Word 2007 (*.docx)
Microsoft Works (*.wks)
MP3 (metadata only) (*.mp3)
Multimate Advantage II (*.dox)
Multimate version 4 (*.doc)
OpenOffice 2.x and 1.x documents,
spreadsheets, and presentations (*.sxc, *.sxd, *.sxi,
*.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott,
*.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes
OASIS Open Document Format for Office Applications)
Quattro Pro (*.wb1, *.wb2, *.wb3,
*.qpw)
QuickTime (*.mov, *.m4a, *.m4v)
TAR (*.tar)
TIFF (*.tif)
TNEF (winmail.dat files)
Treepad HJT files (*.hjt)
Unicode (UCS16, Mac or Windows byte order,
or UTF-8)
Windows Metafile Format (*.wmf)
WMA media files (metadata only)
(*.wma)
WMV video files (metadata only)
(*.wmv)
WordPerfect 4.2 (See note
3) (*.wpd, *.wpf)
WordPerfect (5.0 and later) (*.wpd,
*.wpf)
WordStar version 1, 2, 3 (See note 3) (*.ws)
WordStar versions 4, 5, 6 (*.ws)
WordStar 2000
Write (*.wri)
XBase (including FoxPro, dBase, and other
XBase-compatible formats) (*.dbf)
XML (*.xml)
XML Paper Specification (*.xps) (version
7.40)
XSL
XyWrite (See note
3)
ZIP (*.zip)
[1] Databases. Beginning with version 7.54, dtSearch
no longer uses ODBC or any Microsoft database drivers to
index Microsoft Access files. Earlier versions relied
on ODBC to parse Access files. Each record of a
database is indexed as a separate document.
For information on indexing SQL databases, click
here.
[2] Outlook and Exchange.
dtSearch Desktop can index Outlook and Exchange
message stores using MAPI. For more information, click
here.
[3] Older Word Processor Formats.
dtSearch can index and display, but cannot
automatically recognize, documents in the following
formats:
WordPerfect
4.2
WordStar versions
before 4
XyWrite
Ascii Text
In dtSearch Desktop, click Options >
Preferences > File Types tell dtSearch how to identify
these types of files.
[4] Web Sites. dtSearch
Desktop/Network includes a spider that can index and search
dynamically-generated content or static content on web
sites. For more information, click here.
Automatically-detected fields
The dtSearch Engine automatically detects
fields in the following file formats:
|
File format
|
Fields
|
|
Email files (Outlook
Express, Eudora, MBOX,
EML)
|
Sender, Recipient, Subject,
Date, CC, BCC
|
|
Outlook items and .MSG
files
|
Sender, Recipient, Subject,
Sent Date, CC, BCC, contact fields
(StreetAddress, CompanyName,
etc.)
|
|
Microsoft Word, Excel,
PowerPoint
|
Document summary information
fields
|
|
OpenOffice/Open Document Format
|
Document properties fields
|
|
HTML
|
META tags; <TITLE> is
indexed as HtmlTitle field;
<H1>, <H2>, <H3>
are indexed as HtmlH1, HtmlH2,
HtmlH3, etc.
|
|
XML
|
All fields
|
|
DBF
|
All fields
|
|
CSV
|
All fields (CSV, or
comma-separated values, files must
have a .csv extension, a list of
field names in the first line, and
must use tab, comma, or semicolon
delimiters)
|
|
PDF files
|
Document
Properties
|
|
WordPerfect
|
Document summary information
fields
|
|
MP3
|
All metadata
fields
|
|
JPG, TIFF
|
EXIF and IPTC metadata fields; XMP (Vista)
metadata supported in version 7.40
|
|
ASF, WMA, WMV
|
All metadata
fields
|
Other File Formats
dtSearch will still index, search, and display
other file formats, but they will be treated as binary file
types. In other words, all binary codes, etc. will be displayed
along with the text. dtSearch can also use a proprietary binary
file filtering algorithm to clean up these file formats. For
more information see Indexing Options in the
dtSearch help file.
For legacy file types in which multiple
messages or log entries are stored in one very large text file,
use the dtSearch File
Segmentation Rules feature to tell dtSearch how to
break up the file into multiple logical subdocuments. For more
information, see File Segmentation Rules in the dtSearch help
file.
Image Formats
dtSearch Desktop/Network can display images in
the following formats:
BMP
EPSF
GIF
IMG
JPEG
PCX
PNG
TIFF
Targa
WMF
WPG (WPG version
1.0 only)
When viewing multipage images, use PgUp and
PgDn to navigate between the pages. The dtSearch image viewer
also includes viewing options such as Zoom In, Zoom Out,
Invert, Rotate, etc.
|