Search engines may be more than adequate to comparison shop or to identify the capital of Moldova (its Chisinau) – but searching the electronic universe to find patterns indicating terrorist activity requires higher-caliber technology.
A new generation of software called Starlight 3.0, developed for the Department of Homeland Security by the Pacific Northwest National Laboratory (PNNL), can unravel the complex web of relationships between people, places, and events. And other new software can even provide answers to unasked questions.
Anticipating terrorist activity requires continually decoding the meaning behind countless emails, Web pages, financial transactions, and other documents, according to Jim Thomas, director of the National Visualization and Analytics Center (NVAC) in Richland, Washington.
Federal agencies participating in terrorism prevention monitor computer networks, wiretap phones, and scour public records and private financial transactions into massive data repositories.
“We need technologies to deal with complex, conflicting, and sometimes deceptive information,” says Thomas at NVAC, which was founded last year to detect and reduce the threats of terrorist attacks.
In September 2005, NVAC, a division of the PNNL, will release its Starlight 3.0 visual analytics software, which graphically displays the relationships and interactions between documents containing text, images, audio, and video.
The previous generation of software was not fully visual and contained separate modules for different functions. It has been redesigned with an enhanced graphical interface that allows intelligence personnel to analyze larger datasets interactively, discard unrelated content, and add new streams of data as they are received, according to John Risch, a chief scientist at Pacific Northwest National Laboratory.
Starlight quadruples the number of documents that can be analyzed at one time – from the previous 10,000 to 40,000 – depending on the type of files. It also permits multiple visualizations to be opened simultaneously, which allows officers for the first time to analyze geospatial data within the program. According to Risch, a user will be able to see not only when but where and in what proximity to each other activities occurred.
“For tracking terrorist networks, you can simultaneously bring in telephone intercepts, financial transactions, and other documents all into one place, which wasn’t possible before,” Risch says.
The Windows-based program describes and stores data in the XML (extensible markup language) format and automatically converts data from other formats, such as databases and audio transcriptions.
Risch says that as the volume of data being collected increases, the software has to be more efficient in visually representing the complex relationships between documents.
“Starlight can show all the links found on a Web page, summarize the topics discussed on those pages and how they are connected [to the original page].”
PNNL is also continuing to enhance IN-SPIRE, its software that extracts the meaning of large datasets and allows users to pose alternative hypotheses and to see data supporting that scenario, according to director Thomas. For instance, an analyst could posit that Osama Bin Laden is planning an attack on a European nation at a given time and with a particular weapon. IN-SPIRE will look for relationships between documents validating the hypothesis; for example, the software would look for the most likely nearby locations where such a weapon could be acquired and if secondary or tertiary associates have visited those areas.
Thomas says IN-SPIRE can search documents in multiple languages simultaneously and enables the “discovery of the unexpected,” says Thomas.
Visualizations generated by both of PNNLs programs graphically depict relationships between content by displaying them in a variety of formats, such as a star cluster showing more popular topics as larger stars; topographic maps; or a river of information showing interest in a topic over time. Generating visualizations instead of relying on text-based searches “allows the human mind insight into fuzzy relationships and tries to resolve uncertainty,” says Thomas.
NVAC is not the only organization developing analytical software for the federal government. The Department of Defense is using software from Intelligenxia called IxReveal to track online message threads and give “answers to questions that haven’t been asked,” according to Ren Mohan, co-chairman and CTO of the Jacksonville, Florida-based data analysis company.
Mohan says that, because “we often dont know what we don’t know” about terrorist activities, analysts employ the company’s IxReveal to extract the topics that are being discussed most frequently rather than searching for specific items. This approach can overcome analyst bias by exposing all of the important concepts currently being discussed in chat rooms, email, or user groups, according to Mohan.
IxReveal can drill down through multiple paths simultaneously, enabling analysts to see multiple dimensions and possibilities, according to Mohan. The value of textual data is often hidden and must be extracted by automatically identifying key ideas that focus on concepts instead of the details, and do so in a timely fashion, he says.
“We are trying to address the secondary questions (about data),” Mohan says, adding that his company takes input from analysts to refine the technology.
Not so surprisingly, the number of researchers working on visualization software will greatly increase this year. Whats more, the Department of Homeland Security is looking to create new generations of terrorism-tracking software by tapping into the “fresh ideas” of current university students, according to NVAC’s Thomas.
This year, NVAC will establish five regional analytics centers to tackle specific applications for fighting terrorism. It has selected Stanford University as the first center, with a mission that includes analyzing computer networks to detect network intrusions. The regional centers will be a way to introduce both students and faculty to anti-terror efforts and the science of protecting homeland security, according to Thomas.
Thomas said 85 people are currently working on NVAC’s software development effort, and up to 500 individuals could be involved after all of the regional centers are established.
“The biggest challenge is getting a common understanding of the core science” used to analyze large volumes of data, Thomas said. “If we can clearly articulate it, then that’s half the problem solved.”