Questions About KIM Information Architecture (IA)

Questions about KIM Information architecture (IA):

(1) Does the XML store need to integrate with both, the Document Management system and the Record Management system?

Ø Yes, XML store could be integrated with both DMS and RMS.

(2) According to the diagram the Modelling Content and Authoring Content Component are not directly linked with the XML Store.

Is the assumption correct that modelled and authored content is stored in the Record Management system and the Record management system pushes content into the XML store? This would mean that the authoring environment does not access the XML store directly.

Ø Logically Authoring Environment would talk with DMS (SharePoint) and its contents would move to RMS and XML store upon finalisation. That is to say, the Authoring Environment may not directly push its contents to XML store and its contents would transit through the DMS first. But Authoring Environment will access the XML store to retrieve existing fragments for repurposing.

Ø Whether certain post-editorial functions could be done more easily on XML store would need to be evaluated.

(3) In the IA diagram there is direct link from the Knowledge portal to the XML Store.

Is it correct that the Knowledge portal accesses content only via the triple store?

Ø No. Information about content including its location will be stored in the triple store. When needed, the content would be extracted from its current storage location, be it XML store, Livelink, or other sources, if any.

(4) In 2.1.2.3 KIM IA Design, 6th bulleted list item it is mentioned (along with other facts) that the XML Store stores structured, semi-structured and unstructured content.

Are all content assets (Collected content and Authored Content) to be stored in the XML Store, so that the XML store will act as the global content repository managing all content assets?

Ø As the vision for the future, yes. All valuable content assets could be stored in XML store.

(5) Semantic Enrichment

According to the diagram the output of the semantic enrichment is stored in the XML store. Is there no direct link/interfacing between the tool for semantic enrichment (Temis Luxid) and the triple store?

Which component/system is responsible to generate the triples that are stored in the triple store?

Ø Triple store should be seen a logical store, physically the triple store can have multiple instances.

Ø Triples in the triple store could come from a variety of sources.

· Specifically for official OECD content both semantic and syntax links will be embedded within the content.

· The store for legacy content or other type of information like videos or pictures is RMS. In this case the semantic enrichment would run directly on RMS and push the triples to the triple store.

· Other sources for triples could the statistical databases and so on.

Question about Search (2.2.8 Search Strategy):

Are Exalead as search for external and internal audiences and SharePoint/FAST as search for internal audiences the exclusive and only systems that are utilized for search? Are there plans that products and information applications built on top of the XML store as well as audiences can leverage the search capabilities of the XML store directly?

Ø We have Exalead and Fast in place and we would like to ensure the delivery of content from the XML store through these search engines. So please illustrate how best to integrate, what would be involved, if and how is has been done, etc…

Question about Section 2.4.1.9 Integration/APIs/Connectors

5)Provide examples of how the tool has already been implemented in technical environments similar to OECD’s technical environment. Describe the functionality provided and how the integration is implemented.

Does OECD wish us to detail how we integrate with the technologies in section 1, i.e. “Microsoft SharePoint, Active Directory, Microsoft SQL Server, an RDF store, OpenText Content Server, TEMIS Luxid, Microsoft Office, Outlook, Terminal Four (a Web CMS), and the file system”?

Or does OECD wish us to more generally describe how we integrate into similar technical environments as detailed in section 2.3.1 of “CFT 100000562 EnterpriseNativeXMLStore 20130927 Final.pdf” by providing reference architectures?

Ø The former.

Question about 2.4.1.4 – Querying

7) Can your product do KWIC (key-word in context) queries?

Can OECD elaborate on what this means to them? Does this simply mean keyword search in any part of the document, or something more?

Ø KWIC means keyword search in any part of the document. For more information and an example, please refer to http://en.wikipedia.org/wiki/Key_Word_in_Context.

MarkLogic Sizing Questionnaire

The table in Section 2.3.2 Sizing Information contains information that should be taken into account in order to size the proposed configuration in terms of storage. In order do a realistic sizing we would like to ask the OECD for additional information:

Ø By “size” do you mean text-only documents, or text + XML elements and attributes?

· What types/ formats of content will be stored in the XML store (i.e. XML, PDF, Images, etc.)?

Ø XML, PDF, Office documents, SVG images, and binary images.

· What is the average size of the different kind of documents (average size per type)?

Ø XML: the average size can vary enormously, from a one-pager to a hundred page-long chapter;

Ø PDF: similar to XML;

Ø Office documents: similar to XML;

Ø SVG images: no experience yet;

Ø Binary images: varies.

· How many documents that are imported into the XML store are loaded initially in the database (by type/format)?

Ø We are not in the position to answer to this question at this time.

· What is the update rate for records/ documents?

Ø We don’t have an exact estimation of update rate at the moment. Please illustrate your capacity based on OECD environment description.

· Does the OECD want to leverage wildcard search, proximity search, phrase search?

Ø Wildcard search: certainly;

Ø Proximity search: possibly;

Ø Phrase search: would be nice.

· What is the query rate (number of query rate per second)?

Ø We don’t have an exact estimation at the moment. However, the platform should be scalable as required.

Page 3 of 4