| |
|
|
| Search Features-
International
Languages |
|
|
|
|
Unicode
Support |
 |
Unicode support
allows for indexing and
searching of non-English
text, including every
character set supported by
the Unicode
standard. |
 |
In addition to
Unicode support, dtSearch
offers extensive alphabet
customization
options. |
 |
See
Unicode FAQ for more
technical information. |
|
|
|
|
Language Extension
Packs |
 |
The dtSearch product
line includes an English
noise word list and stemming
rules (to find words such as
learn, learned, learns,
learning, etc. that are
linguistically
related). |
 |
dtSearch's UK
distributor offers
pre-packaged sets of noise
word lists and stemming rules
covering a wide variety of
European languages.
Language Extension
Packs |
 |
The Western European
group includes (in addition
to English): Danish,
Dutch, Finnish, French,
German, Italian, Norwegian,
Portuguese, Spanish and
Swedish. |
 |
The Eastern European
group includes: Belarusian,
Bulgarian, Czech, Estonian,
Greek, Hungarian, Latvian,
Lithuanian, Polish, Russian,
Slovak, Slovenian, Turkish
and Ukrainian.
Cyrillic
article |
 |
Licensing: dtSearch
Corp. can add either the
Western European group or the
Eastern European group onto a
signed dtSearch developer
license. Please
Contact dtSearch for
details. |
 |
More information on
the
Language Extension
Packs |
 |
Request a
trial
version |
|
|
Chinese, Japanese
and Korean Text With No Word
Breaks |
 |
Some Chinese,
Japanese, and Korean text
does not include word breaks.
Instead, the text appears as
lines of characters with no
spaces between the
words. |
 |
Because there are no
spaces separating the words
on each line, dtSearch sees
each line of text as a single
long word. |
 |
To make this type of
text searchable, enable
automatic insertion of word
breaks around Chinese,
Japanese, and Korean
characters, so each character
will be treated as single
word. |
 |
dtSearch
Desktop/Network: In
Options > Preferences >
Letters and Words, check the
box to “Insert word breaks
between Chinese, Japanese,
and Korean characters in
text.” |
 |
dtSearch
Developer API: set
dtsoTfAutoBreakCJK in
Options.TextFlags. |
|
Language Analyzer
API
Integration |
 |
The dtSearch Engine
includes a
language analyzer API
that can be used to integrate
morphological analyzers and custom
or dictionary-based word breakers
into the dtSearch Engine indexing
process. |
 |
The dtSearch Engine
offers
integration with Basis
Technology's Rosette Linguistics
Platform for enhanced Chinese,
Japanese and Korean text
retrieval. |
 |
The dtSearch Engine
also includes an API for
substituting a non-English
language thesaurus for the
existing English-language
one. |
|
| |
|
|
|
|
|
The dtSearch product
line can instantly search terabytes of
text across a desktop, network,
Internet or Intranet
site.
|
|
dtSearch products
also serve as tools for publishing,
with instant text searching, large
document collections to Web sites or
CD/DVDs.
|
 |
over two dozen indexed, unindexed,
fielded and full-text search
options |
 |
highlights
hits in HTML, XML and PDF, while
displaying embedded links, formatting and
images |
 |
converts other file types — word
processor, database, spreadsheet, email and
full-text of email attachments, ZIP, Unicode,
etc. — to HTML for display with highlighted
hits |
 |
built-in Spider adds a third-party
or other Web site (public, secure content,
password accessible, etc.) to your searchable
database |
 |
Spider supports Web-based
content (HTML, PDF, XML, etc.) as well as
dynamically-generated content (ASP.NET, MS CMS,
SharePoint, etc.) |
| General supported file
types |
| SQL and similar data
sources |
|