Obsoleting entities in the MIDB

This document describes several ways in which entities can be obsoleted in the MIDB. Obsoleting an entity means that it is no longer valid. In the current implementation that is done by setting the vend attribute from NULL to an integer (stepped up from the vstart attribute or read from the Version table).

The problem occurs when aggregating foreign data in which some of the entities are obsoleted. The aggregation module aggregates foreign data in such a way that obsoleted entities are physically deleted from the local database. This causes problems with the database constraints. In this document multiple solutions will be proposed for solving this problem.

Solution No. 1

This solution proposes that if one of the entities becomes obsolete, all associated entities become obsolete as well. An example of this would be obsoleting an organization. By obsoleting it, all related entities (services, eservices and documents) will become obsolete too. Although obsoleting an eservice won’t affect organization. This is because one organization can have multiple eservices and one eservice can have only one organization as its owner. As for tables such as sagency, eagency, sinter, etc., only the queries for their retrieval should be changed, in such a way so that they check whether one of the entities is obsoleted. If it is, then those tables will not be syndicated. The benefits of this solution would be relatively easy implementation in the EServices modules (WP4) and minor changes in the aggregation module (the queries for aggregating tables such as eagency, sagency, etc. in which one of the entities contained is obsoleted. Those entities should be physically deleted in the local database). The cons of this approach would be not following the business logic.

[Radu Bonc1]Some considerations:

Solution #1 is somehow the current implementation, except for the aggregation changes, that is, entities obsoleted (Org, Doc, Prof, Service, eService) are not published, thus neither entities referred from dtext, otext, sinter, sagency, einter, eagency, stext, etext, because we get only the associations of valid, non-obsoleted main entities.

However, there is this scenario:

Consider Org1 which is owner of Service1. This is the only reference to Org1, no other associations are in place. You can obsolete Org1, though is in use. You'll get the following effect: Org1 will not be published because is obsoleted, but Service1 will be published because is valid with owner pointing to a resource named Org1. In the RDF you'll get the description for Service1, but no description for Org1 though there is a reference to Org1. This has a severe impact on aggregation which can not be overcome: the aggregation will try to insert/update Service1 with owner being null, which will fail because of the db schema.

Not publishing Service1 is also not desirable as perhaps the user who obsoleted the Doc1 had no intention in hiding Service1 and might also cascade on not publishing even more entities.

So, I think we should do some checking on association/references before obsoleting.

Solution No. 2

With this solution, the logic behind the obsolete operation will be slightly different. If one entity is to be obsoleted, first it has to be checked if some other entities depend on that particular entity. If there aren’t any, than it can be obsoleted and if not, an Exception will be thrown. A small example of this would be trying to obsolete an organization. If there are any services, eservices or documents that are related to this organization, an Exception will be thrown. And if not, that organization will be properly obsoleted. The implementation of the presented solution will only affect the EServices modules (WP4). A check will be done before obsoleting some of the entities. The aggregation module will be intact. Implementing the above described solution would be slightly harder than the previous one.

Solution No. 3

The last solution proposes that foreign data be versioned as well, but not in the same way as national data. The versioning will not be done globally, but it will be done per entity (this means, if there are multiple entities in the same table, with different spocs ids, each of those will have its own version). This means if entity is obsoleted its vend value will be set from NULL to vstart+1. The benefits from implementing this solution would the relatively easy implementation, and it will only affect the aggregation module (instead of physically deleting the obsoleted entities, their vend attribute will be set to a finite value (vstart stepped up), but updates also have to work in similar way, by obsoleting the old foreign data and inserting new with vstart and vend properly adjusted). The only problem with the presented solution would be the local MIDB containing irrelevant information regarding foreign data (previous versions of foreign entities, which are not needed). Foreign data in tables such as sagency, eagency, sinter, einter could be either deleted or kept (I think it should be deleted).

[Radu Bonc2]Some considerations:

Not sure how we could manage the versions...keeping a table with versions for foreign data might be one solution – a global version mechanism similar to national. Another one would be to always search for the foreign entity max(vstart) but this might affect performance for larger collections. A much consistent approach is to pass vstart to rdf descriptions when publishing and get the vstart from description when aggregating, basically have the “what you see is what you get” implemented. But this would require more work then solution #2, as we would have to change almost all jena models.

[Radu Bonc1]Obs 1.1

[Radu Bonc2]Obs 3.1.