Entity Management Extracts

Interface Specification

INT047-INT050– v3.2

April 30, 2013

Change Log

Version # / Date of Change / Section / Description of Change / Changed By
1.0 / 01/20/2011 / All / Initial draft combined with Sunki’s work / James Neely
1.1 / 01/24/2011 / Requirements
Appendix B / Added language about the differences between the extracts and also included appendix B / Adam Stansifer
1.2 / 02/14/2011 / All / Added technical detail, clarified roles and permissions information, and edited formatting / Don Tennant
1.3 / 02/15/2011 / All / Formatting / Don Tennant
2.0 / 04/04/2011 / All / Revisions throughout / Don Tennant
2.1 / 5/4/2011 / All / Reviewed / Pushkar Varma
2.2 / 6/20/2011 / All / Reviewed / Ryan Stenberg
2.3 / 2/7/2012 / All / Added MPIN Extract requirement as requested by client / Evan Botsford
2.4 / 11/26/2012 / 3, 4, 5, 8, 9 / Changed file naming standard, provided more details regarding the file within the zip file, added information regarding header, footer and file delimiters. Added data persistence policy, exclusion criteria and the grain of extract. Added a section 9 to document finalized attribute list. / Himanshu Verma
2.6 / 12/11/2012 / All / Responded to comments from GSA / Himanshu Verma
3.0 / 01/25/2013 / 8.1 and 8.2 / Responded to feedback from end users of these files. / Himanshu Verma
3.1 / 02/21/2013 / Added Clarification of multiple zip files based on NASA Feedback. / Adam Stansifer
3.2 / 03/06/2013 / Added Clarification of MDAT File / Adam Stansifer
3.1 / 4/30/2013 / Updated file naming conventions / Adam Stansifer

1 of 13

Table of Contents

1Overview

2Requirements

2.1Interface Requirements

2.2Interface Details

3Naming Conventions

3.1File Naming Conventions

3.2SFTP and Web File Directory

4Data Format

4.1Delimited

4.2New Line Separators

4.3Header

4.4Footer

5Rules and Policies

5.1File posting Date / Time

5.2Processing Rules

5.3Interface Archive Policy

5.4Data Persistance Policy

5.5Interface Deletion Policy

5.6Refresh Policy

6Security

7Interface Agreement

7.1Policies regarding the dissemination of SAM data

8Extract Grain & Inclusion Criteria

8.1Extract Grain: / (Uniqueness criteria)

8.2Inclusion Criteria

9Attribute List

10Data Standardization Rules

1 of 13

1Overview

The purpose of this document is to describe the Entity Management Extracts that allow SAM to share, where appropriate, the entity profiles within the federal government for many functions, including, procurement, contracts, payments, size determination, and validation.Theinterface provides data pertaining to all vendor organizations to authorized and authenticated users via Secure File Transfer Protocol (SFTP) or the Extracts section of the SAM website.The interface includes daily downloads of database updates/new records as well as a monthly refresh of the entire SAM Entity Management database.This interface is automatically scheduled for generation by SAM; the resultant extracts are then made available for automatic or manual retrieval by the SAM consumer.

2Requirements

2.1Interface Requirements

Each list of data elements will be constructed as a CSV (i.e. plain text) document.These documents will be made available to SAM users over sFTP, and HTTP (i.e. via the SAM website).

The following requirements need to be met for Entity Management Extract interfaces:

  • SSH software is required to connect via SSH/SFTP

Extract information will be provided based on the sensitivity level of the data contained in that extract.Roles based access will be used to assign permissions to read data at each of these various sensitivities.The four SAM extracts include:

  • Public – This data can be downloaded by any authenticated user (i.e. anyone who has registered for a SAM account).This file will contain data that is otherwise viewable to public users who browse the SAM website.
  • FOUO – This data will require permission via a role assignment.This file contains data included in the Public file as well as all data categorized as “For Official Use Only.”It excludes Sensitive and System Only information.
  • Sensitive – This data will require permission via a role assignment.This file contains data included in the Public and FOUO files, as well as data categorized as sensitive information about entities SAM.
  • System Only – This data will require permission via a role assignment.This file contains all SAM data, including the information from the Public, FOUO, and Sensitive files.This file also contains information that is only pertinent for specific systems to share with one another.

In addition, the legacy MPIN extract will be carried forward and will be referred to as the SAM MPIN extract.

  • MPIN – This data will require permission via a role assignment.This file contains a small subset of data (4 elements) included in the Public files (CAGE Code, DUNS, DUNS+4), as well as data included in the Sensitive files (MPIN).

2.2Interface File Formats

Extract data will be provided in two zip files:

  • Data File (.DAT) – This file will contain the actual extract data. It will be formatted as described in this document as well as the extract mapping document. See the file naming conventions section to understand the naming of this main extract file.
  • Metadata File (.MDAT)– This file will contain all column header information for the given file. This way the user can be assured that the most up to date header metadata information is available at any given time. See the file naming conventions section to understand the naming of this Metadata extract file.

Please refer to the SAM MasterExtract Mapping spreadsheet for the list of data elements.

The Entity Management Extracts interface provides data pertaining to all entity organizations to authorized and authenticated users via Web Service, Secure FTP or the Extracts portion of the SAM website.

All extracts must be generated and compressed as .ZIP files due to large file sizes.

2.3Interface Details

Table 2.1: Interface Details

Parameter / Details
Interface ID / INT047, INT048, INT049, INT050
Trigger Event / Scheduled monthly and daily
Implementation / Automated generation; manual retrieval
Protocols / SFTP, HTTP (Website)
Data Format / CSV
Frequency /
  • Extracts are generated from SAM data daily and monthly.
  • Extracts may be accessed daily/monthly (via sFTP or HTTP); this will not, however, generate an ad-hoc report or refresh the data extract more frequently than once per day.

Authentication Method / Username and password is required for all, but public, extracts, regardless of format or method of delivery.
Request Parameters / N/A
Response Parameters / N/A
Exception Conditions / If the generated extracts cannot be stored on the filesystem due to lack of storage capacity, a notification should be sent to the administrator with details about the extract and the error condition.

3Naming Conventions

New refresh files are available on the first Sunday of each month. Daily update files are produced Tuesday through Saturday and are available on the server by 8:00 AM Eastern Time.

3.1File Naming Conventions

As outlined below, the SAM extract files will consist of two ZIP files, each with one file inside.The first zip file contains the extract data.The second zip file will contain metadata, such as column headers, to be used with the data file.The file naming convention for SAM Entity Management Extracts interface will be as follows:

The File that contains extract data will be named as:

Sequence / Values
1 / SAM_
2 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
3 / Underscore (_)
4 / [Refresh] (select from {Monthly, Daily})
5 / Underscore (_)
6 / YYYYMMDD (date of receipt)
7 / .DAT

File that contains the metadata data will be named as:

Sequence / Values
1 / SAM_
2 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
3 / Underscore (_)
4 / select from {MONTHLY, DAILY}
5 / Underscore (_)
6 / YYYYMMDD (date of receipt)
7 / .MDAT

The metadata (MDAT) file will contain only the column headers separated by pipe (|) for the respective data elements in the (DAT) file.

This Format of the ZIP file containing the extract data will be as follows:

Sequence / Values
1 / SAM_
2 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
3 / Underscore (_)
4 / select from {MONTHLY, DAILY}
5 / Underscore (_)
6 / YYYYMMDD (date of receipt)
7 / Underscore (_)
8 / DAT
9 / .ZIP

This Format of the ZIP file containing the metadata data will be as follows:

Sequence / Values
1 / SAM_
2 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
3 / Underscore (_)
4 / select from {MONTHLY, DAILY}
5 / Underscore (_)
6 / YYYYMMDD (date of receipt)
7 / Underscore (_)
8 / MDAT
9 / .ZIP

For example:

  • SAM_PUBLIC_MONTHLY_20110102_DAT.ZIP – This is the monthly refresh of the Public extract for February 2011 (made available in this example on February 1st). This zip file will contain SAM_PUBLIC_MONTHLY_20110102.DAT
  • SAM_PUBLIC_MONTHLY_20110102_MDAT.ZIP – This is the metadata file for the February 2011 refresh of the Public extract (made available in this example on February 1st). This zip file will contain SAM_PUBLIC_MONTHLY_20110102.MDAT
  • SAM_SYSTEMONLY_DAILY_20110102_DAT.ZIP – This is the daily update of the System Only extract, made available on February 10, 2011. This zip file will contain SAM_SENSITIVE_DAILY_20110102.DAT
  • SAM_SENSITIVE_DAILY_20110102_MDAT.ZIP – This is the daily metadata update of the Sensitive extract, made available on February 10, 2011. This zip file will contain SAM_SENSITIVE_DAILY_20110102.MDAT

3.2SFTP and Web File Directory

The directory for storing SAM interface data will be organized as follows:

  • Current folder: in/current/entitymgmt/extracts
  • Processed folder: in/processed/entitymgmt/extracts
  • Archived folder: in/archived/year/entitymgmt/extracts
  • Error folder: in/error/entitymgmt
  • Error log folder: logs/interface/error/

These directories will be visible to only those users who have been granted access to them.

Folder Structure below reflects the structure of both legacy and SAM (to be) extracts.

The data will be available on secure ftp server sftp.sam.gov.

4Data Format

4.1Delimited

The plain text extracts will be provided via SFTP and HTTP (website).The data delimiters used will be as follows:

  • Pipe (|) is used to separate single valued data elements
  • Tilde (~)is used to separatemultiple occurrences of a data element. For a concatenated string it will be the main delimiter to separate multiple concatenated strings.
  • Caret (^) is used to separate values in a multi-valued strings.

For example in the concatenated string below:

BL^30000000^15000000^30000000^15000000~LS^42^5000000^0^0~PG^1^0^0^0

2 ~’s separate 3 strings.

  • BL^30000000^15000000^30000000^15000000,
  • LS^42^5000000^0^0 and
  • PG^1^0^0^0

Where as ^’s separate each of the elements within a particular concatenated string.

All Delimiters (|, ~, and ^) will be removed from data and replaced with an empty string.

4.2New Line Separators

  • Line feed will be used as the new line separator.

4.3Header

Header, the first record in the data file will contain following elements:

Position / Values
1 / BOF
5 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
16 / Start Date (YYYYMMDD). [00000000 for Monthly extracts]
25 / End Date (YYYYMMDD) [For Monthly Extracts this is the last date of data included in the monthly extracts.
34 / Number of records (7 digits): Eg: 102 records will be 0000102
42 / File Sequence Value (7 digits): Eg 631 will be 0000631

For example, a header could appear as follows:

Daily File:BOF SENSITIVE 20121210 201212100000102 0000631

Monthly File:BOF SENSITIVE 00000000 20121203 0684123 0000631

4.4Footer

Footer, the last record in the data file will contain following elements:

Position / Values
1 / EOF
5 / [Sensitivity] (select from {PUBLIC, FOUO, SENSITIVE, SYSTEMONLY, MPIN})
16 / Start Date (YYYYMMDD). [00000000 for Monthly extracts]
25 / End Date (YYYYMMDD)
34 / Number of records (7 digits): Eg: 102 records will be 0000102
42 / File Sequence Value (7 digits): Eg 631 will be 0000631

For example, a footer file could appear as follows:

Daily File:EOF SENSITIVE 20121210 201212100000102 0000631

Monthly File:EOF SENSITIVE 00000000 20121203 0684123 0000631

5Rules and Policies

5.1File posting Date / Time

SAM daily extract are a reflection of all the changes made in SAM on the previous day. SAM Daily extracts will be posted every day weekday, from Tuesday to Saturday between 12 AM and 7 AM EST. They will include data that has changed since the previous execution of daily extract.

SAM Monthly extracts will be a full volume extract that will extract all the data that matches the data extraction criteria. SAM Monthly extracts will be posted on the 1st Sunday of every month between 12:01 AM and 11:59 PM.

Under certain scenarios like data corruption, database downtime, application upgrades there might be a delay in posting the daily / monthly extracts.

SAM data quality team will make every effort to post clean data, however, in case of data corruption / file damage of extracted data following process will be followed:

  • For a Daily file: If data corruption is identified the same day, the corrupt daily file will be replaced with the new file. If the data corruption is identified in the following days, all the previous posted file until the point of corruption will be considered corrupt and will be removed and replaced by a single range file. The range file shall contain all the changes from the point of data corruption. The start date in file header / footer in this case will be the date the first data corruption was identified, and end date will beexecution date minus one day. The range file will be in the same format as the Daily files.
  • For a Monthly File: The corrupt monthly file will be removed and the clean monthly extract file will be posted. This extract will be based on data as it stands at the point of execution. (Not necessarily the 1st Sunday of a month in this case)

The government has established data quality standards for the daily and monthly files which accepts a minimum amount of errors in a posted files.These standards will be followed; when a file does not meet quality standards, the government will be alerted and involved in the contingency decision. In addition there is a 99% quality target for posting time.

5.2Processing Rules

For thedaily extracts, SAM will make data available to authorized and authenticated users via Secure FTP or the Extracts portion of the SAM website. For the monthly extracts, users will be able to retrievea “snapshot” ofall active entity records. This service allows users who have been previously authorized and issued a user name and password to retrieve information about vendors who are registered in the CCR database.

5.3Interface Archive Policy

As the data in this interface is derived from the Entity Management records, archiving rules follow entity record archive rules.For detailed information, see the SAM Archive Policy.

5.4Data Persistance Policy

Interface data will persist for a period of 30 days, after which the data will be deleted and will no longer be available either on the web site or SFTP folder. This means that a monthly file will be kept up until the next monthly file is available, at which time it will be removed.There will be 30 days’ worth of daily files posted to the site at all times.

5.5Interface Deletion Policy

As the data in this interface is derived from the Entity Management records, deletion rules follow entity record deletion rules.

5.6Refresh Policy

The CSV extracts will be generated monthly and daily.The daily files will offer a record for all completed transactions taking place within Entity Management since the last daily file generation and are available for 30 days.The refresh files will contain a record for every active profile and profiles expired less than six months at the time of generation.

6Security

SSH software is required to connect via SSH/SFTP to the SAM Extract server and use software appropriate commands and syntax.Each user will be assigned a Username and Password allowing for access to the extract files on the SAM Server.Each Username will be assigned permission to access these extracts on an as needed basis for the type of data wanted.In addition, IP-based authentication will be used for SFTP file transfer.

7Interface Agreement

Interface agreements will be established with organizations whose systems are consumers of SAM extract data.

7.1Policies regarding the dissemination of SAM data

  • All SAM extract data must be handled as stated in the Non-Disclosure Agreement associated with the approval for access to such extracts.
  • Data is to be handled by the individual (data receiver) who has prior approval and is listed on the user profile.
  • All extracts, with the exception of the Public extract, must be downloaded only to a secured government network that has the Authority to Operate as a government system.This data must never reside within a commercial or unsecured environment.
  • The transfer of data must follow the security guidelines and methodologies set forth in Interconnection Security Agreement (ISA).

8Extract Grain & Inclusion Criteria

8.1Extract Grain: / (Uniqueness criteria)

The data extracted will be unique at the DODAAC, DUNS and DUNS_plus4 combination.

8.2Inclusion Criteria

Only following records will be included in the daily / monthly extracts.

Monthly Full Volume Extract

Monthly / Daily
Active Records / All Active records are included / Only records that changed since last execution are included.
Expired Records / Only 6 months of expired records. / Only records that expired since last run are included.
De-activated records / Not included / Only records that have been deactivated since last execution are included.
Changed Data / All active records and only 6 months of expired records. / Only records changed since previous execution are included.
SAM Extract Code / Active, 'A', Expired, 'E' send complete record. / 1 = Delete send DUNS, DUNS +4, DODAAC, CAGE Code
2 = New send complete record
3 = Update send complete record
4 = Expired send DUNS, DUNS +4, DODAAC, CAGE Code
Duns / DODAAC / Either DUNS or DODAAC is available.
If multiple records are found (based on the uniqueness criteria) /
  • If an active record and an expired record and a deactivated record are present, keep the active record and exclude the expired and deactivated record.
  • If expired and deactivated records are present, keep the expired record and exclude the deactivated record.

Delete Flag / delete_flag not equal to 'N'
CAGE / NCAGE / CAGE code OR NCAGE code is not null
Cage Match / Cage_Match_Flag = 'Y'
Record Type / Entity Management Records only (Exclusions and Hierarchy records not to be included)

9Attribute List

<Attribute list from SAM MasterExtract Mapping document>

10Data Standardization Rules

This section refers to the data standardization rules that will be implemented during the process of extracting the data from SAM.

Address Data Standardization

  1. All the address information are in upper case
  2. US states will be 2 digit US state code.
  3. US Zip codes will be either 5 digit or 9 digit

1 of 13