Documents
For Media Mining, the Future is Now! (conclusion)
May 5, 2015
(U) Far Media Mining, the Future Is New! [canclusian]
mom?and
Human Language [323)
Run Date: DBEUWEDUE.
I) Media Mining Acrass a Wide Range bf Languages
Cine bf the challenges in deplbying this Media Mining HLT is the need tb cbver the very
brbad range bf languages. bf the languages bf interest tb the Agency are bf
interest tb cbmmercial cbncems because they are likely tb be prbfitable, and businesses run an
prbfit.
Thbugh CUTS prbducts such as NEXminer have cbvered "dense"
languages such as English and Spanish, and have made great inrbads lately intb a few less-
languages and dialects fbund in the Middle East, it is unclear that any CUTS
prbduct will ever cbver the vast bf languages that SA are required tb
understand. Therefbre, the HLT PMD is develbping an enhancement bf this Media Mining
that can prbcess bver 9D languages using a cbmbinatibn bf language-specific and
universal phbnes. This agency cap ability, develbped within R64, the Human Language
Research Grbup, is as Universal Phbnetic Recbgnitibn
I) New languages can be easily added tb the by drawing bn Agency linguistic
bf a language cbmbined with publicly available language resburces. As events
shape bur language needs, UPR prbvides a way tb within minutes tb new language needs,
example tb the GWCIT.
(U) IVE: Technalagy that Can Separate the Wheat frani the Chaff
I) A secbnd, equally imp brtant enhancement under develbpment is the ability far this HLT
cap ability tb predict what intercepted data might be bf interest tb based an the
past behavibr. Much like the way in which pbpular sites like are able tb track and
predict buyer preferences, integratibn bf Intelligence Value Estimatibn (IVE) bn SRI and
message cbntent, bffers the prbmise bf presenting with highly enriched sbrting bf their
traffic. Imagine if ybu came tb each day knbwing that the best five intercepts needing
transcriptibn were sitting at the tap bf ybur queue waiting ybu.
[if cburse, such Media Mining IVE capabilities need be limited tb SRI and key
searches. In cbllabbratibn with S2UEB, Analytic far the Enterprise, the HLT PMU
Media Mining team is alsb develbping new metadata analysis cap abilities based bn language,
speaker, gender, and dialect identificatibn, presenting this tb thrbugh
cbnventibnal query such as UIS. Advanced like are integrating bther bf
such as gebsp atial will alsb send autbmatic alerts tb when
incbming intercept meets certain search criteria.
(SHED VbiceRT will be integrated with standardAaency vbice such as UIS and
will be able tb cbnfiaare the via the web, and access scares bn their
traffic using NUCLEUN.
(U) Far Media Mining, the Future Is New! [canclusian]
mom?and
Human Language [323)
Run Date: DBEUWEDUE.
I) Media Mining Acrass a Wide Range bf Languages
Cine bf the challenges in deplbying this Media Mining HLT is the need tb cbver the very
brbad range bf languages. bf the languages bf interest tb the Agency are bf
interest tb cbmmercial cbncems because they are likely tb be prbfitable, and businesses run an
prbfit.
Thbugh CUTS prbducts such as NEXminer have cbvered "dense"
languages such as English and Spanish, and have made great inrbads lately intb a few less-
languages and dialects fbund in the Middle East, it is unclear that any CUTS
prbduct will ever cbver the vast bf languages that SA are required tb
understand. Therefbre, the HLT PMD is develbping an enhancement bf this Media Mining
that can prbcess bver 9D languages using a cbmbinatibn bf language-specific and
universal phbnes. This agency cap ability, develbped within R64, the Human Language
Research Grbup, is as Universal Phbnetic Recbgnitibn
I) New languages can be easily added tb the by drawing bn Agency linguistic
bf a language cbmbined with publicly available language resburces. As events
shape bur language needs, UPR prbvides a way tb within minutes tb new language needs,
example tb the GWCIT.
(U) IVE: Technalagy that Can Separate the Wheat frani the Chaff
I) A secbnd, equally imp brtant enhancement under develbpment is the ability far this HLT
cap ability tb predict what intercepted data might be bf interest tb based an the
past behavibr. Much like the way in which pbpular sites like are able tb track and
predict buyer preferences, integratibn bf Intelligence Value Estimatibn (IVE) bn SRI and
message cbntent, bffers the prbmise bf presenting with highly enriched sbrting bf their
traffic. Imagine if ybu came tb each day knbwing that the best five intercepts needing
transcriptibn were sitting at the tap bf ybur queue waiting ybu.
[if cburse, such Media Mining IVE capabilities need be limited tb SRI and key
searches. In cbllabbratibn with S2UEB, Analytic far the Enterprise, the HLT PMU
Media Mining team is alsb develbping new metadata analysis cap abilities based bn language,
speaker, gender, and dialect identificatibn, presenting this tb thrbugh
cbnventibnal query such as UIS. Advanced like are integrating bther bf
such as gebsp atial will alsb send autbmatic alerts tb when
incbming intercept meets certain search criteria.
(SHED VbiceRT will be integrated with standardAaency vbice such as UIS and
will be able tb cbnfiaare the via the web, and access scares bn their
traffic using NUCLEUN.
(U) Bringing it All Tugether
The integratiun bf these technulugies intu an autbmated system will bring twu majbr
innuvatibns: faster respunse time and impruved pruductivity. Uur challenge gual is tu "indea, tag,
and graph? all incuming intercept, and this w?l spun be within reach. Using HLT services, a single
analyst w?l be able tu surt thrbugh milliuns bf cuts per day and fucus un unly the small percentage
that is relevant. The amuunt bf cullectiun can be increased urders bf magnitude withuut further
stressing the analyst pupulatiun, alluwing the Agency tb cast a much wider SIGINT net and taking
in a much richer catch.
I) And again, the puwer bf HLT is tmly realised thruugh integratiun bf multiple SIGINT
technc-lc-gies. In the future, we will further develc-p technulc-gies such as wc-rd search tu suppc-rt
cruss-lingual queries. Sites that lack esp ertise in a given language will be able tu issue queries in
English and receive results translated frum the target language back intu English. This marriage bf
wc-rd search and Machine Translatibn has great putential as a furce multiplier. Mapping meaning
and tradecraft acruss languages will be a key challenge here.
I) Similarly, because a search term w?l be tagged with a "semantic class identifier," such as
"place name," it will be relatively tc- integrate this technc-lugy with the Enterprise
Knuwledge System and alluw suphisticated cap abilities such as sucial netwurk analysis tu
up erate un vuice cuntent. In the HLT PMCI lung-term visiun, will be able tu cunstruct
cumplea queries, such as, "Where is the maybr bf Baghdad?? pr "Sth me all the intercept
cuntaining abuut eaplusive devices that uccurred yesterday in the duwntuwn area bf
Baghdad near the Al-Rashid Hutel," and ubtain answers directly in English, ur in their fureign
language if they prefer, with a link tn the dbcuments cuntaining the answers.
We are entering a gnlden age fur HLT. Puwerful and inexpensive cumputers, high-
speed netwurking, and advanced algurithms are being cumbined tu revulutiuniae the analyst
Eur mere infurmatiun abuut these cap abilities, please cuntact the HLT PMCI uffice ["gu
HL pr call .
(U) Bringing it All Tugether
The integratiun bf these technulugies intu an autbmated system will bring twu majbr
innuvatibns: faster respunse time and impruved pruductivity. Uur challenge gual is tu "indea, tag,
and graph? all incuming intercept, and this w?l spun be within reach. Using HLT services, a single
analyst w?l be able tu surt thrbugh milliuns bf cuts per day and fucus un unly the small percentage
that is relevant. The amuunt bf cullectiun can be increased urders bf magnitude withuut further
stressing the analyst pupulatiun, alluwing the Agency tb cast a much wider SIGINT net and taking
in a much richer catch.
I) And again, the puwer bf HLT is tmly realised thruugh integratiun bf multiple SIGINT
technc-lc-gies. In the future, we will further develc-p technulc-gies such as wc-rd search tu suppc-rt
cruss-lingual queries. Sites that lack esp ertise in a given language will be able tu issue queries in
English and receive results translated frum the target language back intu English. This marriage bf
wc-rd search and Machine Translatibn has great putential as a furce multiplier. Mapping meaning
and tradecraft acruss languages will be a key challenge here.
I) Similarly, because a search term w?l be tagged with a "semantic class identifier," such as
"place name," it will be relatively tc- integrate this technc-lugy with the Enterprise
Knuwledge System and alluw suphisticated cap abilities such as sucial netwurk analysis tu
up erate un vuice cuntent. In the HLT PMCI lung-term visiun, will be able tu cunstruct
cumplea queries, such as, "Where is the maybr bf Baghdad?? pr "Sth me all the intercept
cuntaining abuut eaplusive devices that uccurred yesterday in the duwntuwn area bf
Baghdad near the Al-Rashid Hutel," and ubtain answers directly in English, ur in their fureign
language if they prefer, with a link tn the dbcuments cuntaining the answers.
We are entering a gnlden age fur HLT. Puwerful and inexpensive cumputers, high-
speed netwurking, and advanced algurithms are being cumbined tu revulutiuniae the analyst
Eur mere infurmatiun abuut these cap abilities, please cuntact the HLT PMCI uffice ["gu
HL pr call .