Home
Een nieuwe BI uitdaging: tekstanalyse
08 Nov 2007

In het artikel "A Brief History of Text Analytics" gaat men in op een nieuwe uitdaging binnen het Business Intelligence vakgebied. Met de huidige technologiën kunnen we verder kijken dan getallen en eindelijk ongestructureerde data analyseren:

"Attention in the business intelligence market has turned to a new “old” challenge – text and the technology that extends conventional data analysis solutions to the full breadth of enterprise information assets."

Het volledige artikel is ook te luisteren als podcast .

 


 

Text analytics is a new IT discipline that has already proved itself in applications ranging from pharmaceutical drug discovery to counter-terrorism to survey analysis, in science, government, and industry. It is poised to break out into the broader analytics market, in workbench form, integrated with business intelligence solutions, embedded in line-of-business applications, and enabling semantic search.

Text analytics is an answer to the “unstructured data” problem, which is best expressed by the truism that eighty percent of enterprise information originates and is locked in “unstructured” form. That problem has been recognized for decades. In fact, the first definition of business intelligence (BI) itself, in an October 1958 IBM Journal article by H.P. Luhn, A Business Intelligence System, describes a system that will:

“…utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the ‘action points’ in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points.”

So we see that the earliest BI focus was on text – on extraction, categorization, and classification rather than on numerical data!

Yet as management information systems developed starting in the 1960s, and as BI emerged in the '80s and '90s as a software category and field of practice, the emphasis was on numerical data stored in relational databases. This is not surprising: text in “unstructured” documents is hard to process. We went after the low-hanging fruit – the fielded, numerical data – in response to the analytics imperative that any business process worth conducting should be measurable, and that any data worth collecting should be analyzed.

After two decades of numbers-focused business intelligence, analytical tools and techniques – reporting, OLAP, data mining, ETL and data warehousing – are well understood and have been widely adopted. BI software is now a commodity technology, so lately market attention has turned to a new “old” challenge – text. The difference is that we now have technology that is uniquely capable of extending conventional data analysis solutions to the full breadth of enterprise information assets.

Text analytics as a technology has its roots in linguistics and data mining; but in recent years, it has broken out of the lab into the wider analytics world, first via extensions to data mining workbenches and more recently in the form of term-extraction and analysis interfaces. The ability to discern features in text – for instance, personal and geographic names, dates, telephone numbers and e-mail addresses, as well as concepts and even sentiments – and to extract them to databases is now an important feature of leading ETL tools. We have recently started seeing line-of-business applications that rely on text analytics (for instance, for automated processing of news feeds) that demonstrate the technology’s new maturity.

Verder lezen, zie: B-Eye-Network

 

 

Corso Vacatures