Monday, June 20, 2005

pumping meaning through the network

As long as books have been written about Data Processing and Information Science, those books begin with making a distinction between Data and Information.

Take the popuplar Joe Celko in one of the first pages of his 1999 Data And Databases, Concepts In Practice:

Information is what you get when you distill data. A collection of raw facts does not help
anyone to make a decision until it is reduced to a higher-level abstraction.

Now, this conception seems to me heavily biased by the practice of Management Information systems and Data Warehouses, where the distillation process is indeed of prime importance. But as I have been working in the context of Health Informatics and the Electronic Patient Record (EPR) a very different focus is in order.

In the EPR, data about patients is stored and used by actors with different roles in the healthcare process. Phycians of different specialties, nurses, administrative personel etc. The ideal (one of several) is that the patient data can be shared by this community, in order that each of them gets a more complete picture of the patient.

The problem here is that data entered by x as having some meaning to person x, will be retrieved by y and can have a very different meaning to y. This because of the different contexts of the actors, their different cognitive backgrounds etc. This is actually OK as long as the different meaning y attaches to the data is not a misinterpretation of the data.

Unfortunately such misinterpretations happen. The question is: can the EPR be designed and implemented in a way that makes it more robust against misinterpretations of the data stored in it?

Clearly, something can be done. If someone is recording the patient history and has two fields
Diabetes (yes / no) and Startdate (date), it would be wise to make absolutely clear in the record structure of the EPR that the fields belong together and that the date doesn't belong to Stopped Smoking (yes/no).

This is all common sense, but can easily be forgotten with the nice GUI´s one can construct these days. It´s a matter of minutes to give actor x a form with the said fields on them. And putting the Diabetes and Date field together or even putting a box around them, makes their relation to each other immediately clear to actor x. But this relation can get lost in the database and someone constructing a nice form for actor y has to know about their relation in order to reconstruct not only an apparently meaningful but also a correct form.

In what is perhaps a somewhat overstated metaphore, we see that the meaningfull ´living´ information of actor x gets ´burried´ in the database as ´death´ data and has to be ´resurrected´ to meaningfull ´living´ information again for actor y.

I think you will see by now that this is a different viewpoint on the data - information duality than that of Joe Celko. I'd like to call this the problem of Pumping Meaning Through The Network.

4 Comments:

Anonymous Anonymous said...

Part of a solution is for form-designer x to give meaning to the fields beforehand. This meaning has to be then transferred to form-designer y. A traditional database has a strict data description. This need is equally relevant for an ad-hoc or organic database. I definitely agree that how data is presented also holds implicit meaning as the last link in a fragile chain until the information reaches the user. Maybe someone can come up with a model for this!

1:35 PM  
Anonymous Anonymous said...

Context is altijd belangrijk (zie het zgn FUNARG probleem, en dat was in een "controlled environment")
Context is always important (see the so called FUNARG problem, and that was in a "controlled environment")

1:41 PM  
Anonymous Anonymous said...

Indeed, the database is the only place to store the meaning of data.
Both Oracle and SQL-Server have a catalog which serves as a store for the description of the structure of tables. But in SQL-Server it is not possible to add a description of a column or a table. So it is not a data-disctionary. I am not sure about Oracle.
But there is more: we need more syntax to get more semantics.
For instance: we can ask the database: who is longer than 10 times his shoelength divided by his age. This is obviously rubbish, but it is possible. To avoid this, we need user defined datatypes + strong type checking in the database language. Oracle and SQL-Server offer some but insufficient possibilites.
Moreover we need a syntax for defining constraints in the database. For example: nobody is allowed to earn more than his/her manager.
Both extensions add not only to the correctnes of the database and the database language, but also to the perception of data by human beings.

3:58 PM  
Blogger freetrader said...

I like the idea (Zijlstra)of syntax supporting semantics. It would mean you don't have to let the data die in the DB, but only to put them to cryogenic sleep.

Constraints on the DB would even be more usefull probably if you were able to use them directly in the software that uses the data.

Indeed it seems to me that DBMS's are at a dead end, the way they construced (and used) nowadays.
I.e. they show a total disrespect for the fragility of meaning.

2:28 PM  

Post a Comment

<< Home