In het artikel "A Brief History of Text Analytics" gaat men in op een nieuwe uitdaging binnen het Business Intelligence vakgebied. Met de huidige technologiën kunnen we verder kijken dan getallen en eindelijk ongestructureerde data analyseren:
"Attention in the business intelligence market has turned to a new “old”
challenge – text and the technology that extends conventional data
analysis solutions to the full breadth of enterprise information
assets."
Het volledige artikel is ook te luisteren als podcast .
Text analytics is a new IT discipline that has already proved
itself in applications ranging from pharmaceutical drug discovery to
counter-terrorism to survey analysis, in science, government, and
industry. It is poised to break out into the broader analytics market,
in workbench form, integrated with business intelligence solutions,
embedded in line-of-business applications, and enabling semantic
search.
Text analytics is an answer to the “unstructured data” problem,
which is best expressed by the truism that eighty percent of enterprise
information originates and is locked in “unstructured” form. That
problem has been recognized for decades. In fact, the first definition
of business intelligence (BI) itself, in an October 1958 IBM Journal
article by H.P. Luhn, A Business Intelligence System, describes a system that will:
“…utilize
data-processing machines for auto-abstracting and auto-encoding of
documents and for creating interest profiles for each of the ‘action
points’ in an organization. Both incoming and internally generated
documents are automatically abstracted, characterized by a word
pattern, and sent automatically to appropriate action points.”
So we see that the earliest BI focus was on text – on
extraction, categorization, and classification rather than on numerical
data!
Yet as management information systems developed starting in the
1960s, and as BI emerged in the '80s and '90s as a software category
and field of practice, the emphasis was on numerical data stored in
relational databases. This is not surprising: text in “unstructured”
documents is hard to process. We went after the low-hanging fruit – the
fielded, numerical data – in response to the analytics imperative that
any business process worth conducting should be measurable, and that
any data worth collecting should be analyzed.
After two decades of numbers-focused business intelligence,
analytical tools and techniques – reporting, OLAP, data mining, ETL and
data warehousing – are well understood and have been widely adopted. BI
software is now a commodity technology, so lately market attention has
turned to a new “old” challenge – text. The difference is that we now
have technology that is uniquely capable of extending conventional data
analysis solutions to the full breadth of enterprise information
assets.
Text analytics as a technology has its roots in linguistics and
data mining; but in recent years, it has broken out of the lab into the
wider analytics world, first via extensions to data mining workbenches
and more recently in the form of term-extraction and analysis
interfaces. The ability to discern features in text – for instance,
personal and geographic names, dates, telephone numbers and e-mail
addresses, as well as concepts and even sentiments – and to extract
them to databases is now an important feature of leading ETL tools. We
have recently started seeing line-of-business applications that rely on
text analytics (for instance, for automated processing of news feeds)
that demonstrate the technology’s new maturity.
Verder lezen, zie: B-Eye-Network
|