| |
|
|
| Encyclopaedia Britannica |
|
| Encyclopaedia Britannica’s
Cross-Language Morphological Search Plug In
Integrates with dtSearch |
“We
are
delighted
to
partner
with
dtSearch
and
provide
state of
the art
foreign
language
solutions
for our
customers.”
|
|
With a focus on
Arabic, Farsi and other
Middle Eastern Languages,
Encyclopaedia Britannica
has developed a rich
product suite, which
allows English speaking
users to review and
analyze foreign language
source data.
Components of the product
suite include
Britannica’s Cross
Language Morphological
Analysis (BMA), Cross
Language Entity
Extraction (EntX), and
Embedded Translation
Layer (ETL).
 BMA analyzes each
word or phrase in the
source language,
disambiguating and
normalizing it to a common
form, and produces a
common, disambiguated form
both in the source language
and English and thus
enables users to capture,
in a single query thousands
of complex
inflections. In
addition to including a
full external thesaurus,
BMA also enables cross
language search, where
users can use English
language queries to search
in foreign language
text. Britannica’s
patent pending ETL is an
accurate, context-sensitive
English translation of each
word or phrase in the
searched content, available
for Arabic and
Farsi.
|
“Britannica’s
morphology
suite
seamlessly
integrates
with the
dtSearch
Engine
developer
APIs,
enabling
users to
use
English
language
queries
to search
for
foreign
languages,
overcoming
morphological
complexity
and
ambiguity.”
|
|
EntX accurately
extracts key words and
named entities out of
foreign language
documents, enabling
efficient triage,
categorization and
concept search.
EntX provides these
entities, such as proper
names and places, both in
their original language
and in English.
EntX also supports
English user-defined
categories, to allow
taxonomies and ontology's
developed in English to
be directly applied to
the information in the
source
languages.
Britannica
language analysis suite
integrates with dtSearch
Engine APIs. First,
the BMA adds
morphological
capabilities to the
indexing and query
process. Second,
the BMA’s external
thesaurus and ETL can be
plugged in through the
dtSearch Engine APIs for
implementing multilingual
synonym and
cross-language search.
Third, Britannica’s EntX
can be integrated with
the dtSearch Engine
through C++ and Java
APIs. Sample code
for all of these levels
of integration is
available by contacting
Britannica.
“We are
delighted to partner with
dtSearch and provide
state of the art foreign
language solutions for
our customers,” says
David Litoff, Director of
Business Development,
Natural Language
Division, Encyclopaedia
Britannica.
“Britannica’s
morphological suite
seamlessly integrates
with the dtSearch Engine
developer APIs, enabling
users to use English
language queries to
search for foreign
languages, overcoming
morphological complexity
and ambiguity. All
methods enable smooth and
transparent integration,
adding Britannica’s
language capabilities
while maintaining the
full range of dtSearch’s
flexible search
capacity. The
transparency of the
search integration
ensures that any
interface provided by
dtSearch, whether
programmatic or through a
graphic UI, is available
when using the integrated
product.”
“We are very
excited to work with
Encyclopaedia
Britannica’s Natural
Language Division,” said
Elizabeth Thede, Vice
President of Sales.
“Applying Encyclopaedia
Britannica’s linguistic
expertise will be a great
help to dtSearch
customers requiring
advanced searching of
text in Arabic and other
Middle-Eastern
Languages.”
For a white
paper further describing
the Encyclopaedia
Britannica morphological
analysis suite, please
see EB-WhitePaper-for-Foreign-language-Analysis.doc
For
additional information on
the Encyclopaedia
Britannica morphological
analysis suite and its
integration with the
dtSearch developer APIs,
please click here.
|
| |
Return to Case Study contents
Page |
|
|
|
|
The dtSearch product
line can instantly search terabytes of
text across a desktop, network,
Internet or Intranet
site.
|
|
dtSearch products
also serve as tools for publishing,
with instant text searching, large
document collections to Web sites or
CD/DVDs.
|
 |
over two dozen indexed, unindexed,
fielded and full-text search
options |
 |
highlights
hits in HTML, XML and PDF, while
displaying embedded links, formatting and
images |
 |
converts other file types — word
processor, database, spreadsheet, email and
full-text of email attachments, ZIP, Unicode,
etc. — to HTML for display with highlighted
hits |
 |
built-in Spider adds a third-party
or other Web site (public, secure content,
password accessible, etc.) to your searchable
database |
 |
Spider supports Web-based
content (HTML, PDF, XML, etc.) as well as
dynamically-generated content (ASP.NET, MS CMS,
SharePoint, etc.) |
| General supported file
types |
| SQL and similar data
sources |
|