Monday, April 2, 2018

Best Practices in ETL - BODS

Code Naming Conventions

The Data Services objects should be named to make it as easy as possible to identify their function and object type. The following table contains the available objects in Data Services and their proposed naming convention.

Object
Naming Convention
Project
PJ<Project Subject Area>_Batch
Job (Batch/Real-time)
JB_<RICEFID>_<name>
Work Flow
WF_<RICEFID>_<name>
Data Flow
DF_<RICEFID>_<SubjectArea Segment>_<Action>
Annotation
ANN_<RICEFID>_description
Catch
CATCH_<RICEFID>_description
Conditional
COND_<RICEFID>_description
Datastore
DS_<SOURCETYPE._<SOURCENAME>
Document
DOC_<RICEFID>_description
DTD (Document Type Definition)
DTD_<RICEFID>_description
File Format
FF_<RICEFID>_<ACTION>_DESCRIPTION
Function
FUNC_<description>
Script
SC_<RICEFID>_description
Template Table
TT_<RICEFID>_description
Try
TRY__<RICEFID>_description
While Loop
While_description
XML Message
XMLMSG_<RICEFID>_description
XML Schema
XMLSCH_<RICEFID>_description
XML Template
XMLTMPL_<RICEFID>_description
Address_Enhancement Transform
AE_<RICEFID>_description
Case Transform
Case_description
Date_Generation Transform
DG_<RICEFID>_description
Effective_Date Transform
ED_<RICEFID>_description
Hierarchy_Flattening Transform
HF_<RICEFID>_description
History_Preserving Transform
HP_<RICEFID>_description
Key_Generation Transform
KG_<RICEFID>_description
Map_CDC_Operation Transform
MC_<RICEFID>_description
Map_Operation Transform
MO_<RICEFID>_description
Match_Merge
MM_<RICEFID>_description
Merge
MER_<RICEFID>_description
Name_Parsing
NP_<RICEFID>_description
Pivot
PIV__<RICEFID>_description
Reverse Pivot
RPIV_<RICEFID>_description
Query
QRY_<RICEFID>_description
Row_Generation
RG_<RICEFID>_description
SQL
SQL_<RICEFID>_description
Table_Comparison
TC_<RICEFID>_description

Other Naming Standards

The naming standards for other components which may be involved in the BOBJ DS process are below.
  • Global Variables should be named with a prefix of “$GV_”.
  • Local Variables should be named with a prefix of “$V_”
Project
Format:  PJ_<Project Subject Area>_Batch
Example: PJ_PANGAEA-PHASE1-CONV_BATCH
Projects are the highest level of reusable object in Data Services.  They allow you to group jobs that have dependent schedules or belong to the same application.  DESCRIPTION is the general application.PJ is an abbreviation for Project. Batch suffix signifies that all ETL jobs in this conversion project execute in BATCH mode.
Job
Format: JB_<RICEFID>_TARGETSCHEMANAME
Example: JB_CV50001_SALES_LOAD
A job is a group of objects that you can schedule and execute together.  Multiple jobs can be included in a single project.  When naming a job, include a description of data managed within the job.  In the example above, the job loads sales data to a data mart.  JB is an abbreviation for Job.  When designing jobs, you may want to limit the scope of the job to a particular area of your business, so that you can easily understand the purpose of the job from the name.  Keeping the jobs limited in space also makes it easier to re-use the jobs in other projects.

Datastore
Format: DS_<SOURCETYPE._<SOURCENAME>
Example:  DS_SAP_VR2
A Datastore provides connection information for a source or target database.  Data Services is able to import metadata through the data store connection.  The SOURCETYPE could SAP, TD for Teradata, ORCL for ORACLE and so on. SOURCENAME could be the schema name of the database or just a descriptive name for the source or target database.  DS stands for datastore.
Workflow
Format: WF_<RICEFID>_TARGETTABLE
Example: WF_CV50002_SALES_DIM
               WF_CV50007_CUSTOMER
Workflows can contain other workflows are well as dataflows.  TARGETTABLE refers to the final table in the workflow that is being loaded.  In instances where a workflow contains multiple workflows, it can refer to a category or group of tables.  In the examples, WF_CV50002_SALES_DIM hold all the dimension table loads for the SALES data mart application.  One of the workflows is the WF_CV50007_CUSTOMER, which loads only customer data.  WF stands for workflow.
Dataflow
Format: DF_<RICEFID>__<SubjectArea Segment>_<Action>
Example: DF_CV50007_CUSTOMER_EXTRACTFILE
               DF_CV50007_CUSTOMER_TRANSFORM
               DF_CV50007_CUSTOMER_OUTPUTFILE
                       
Dataflows are the lowest granularity objects in Data Services.  Multiple dataflows can reside in a workflow.  Dataflows hold the specific query or transformation that is being applied to the source data. TARGETTABLE refers to the specific table that the dataflow is loading.  DF stands for dataflow.  In the example, the Customer table data is extracted from file, data is transformed in the 2nd Data flow. In the 3rd Data flow, Customer Ouput File is generated for further processing.
Scripts
Format: SC_<RICEFID>__DESCRIPTION
Example: SC_CV50007_CLEANUP_TEMP_TABLES
Scripts can be created in Jobs or Workflows.  They are singe-use objects and are not stored in the Data Services object library. They can be used to declare and assign variables, call a SQL function or perform SQL statements.  DESCRIPTION can refer to the function or task that the script is performing or the name of a table that the script is loading.  SC stands for script.  Note: Dscriptive Names could include (SQL update table X, Print Message, etc.)  This example describes a script that truncates the temp tables for the job.
Flat Files
Format: FF_<RICEFID>_<ACTION>_DESCRIPTION
Example: FF_CV50009_OUT_ORA_APPS_SALES_DETAIL
Flat files are data sources that are not in database table form.  They are typically used in dataflows to create tables or as a source of metadata.  The example describes the target table that the file will populate from the Oracle application.  FF stands for flat file.
Tables
Format:  DESCRIPTION
Example:  KNA1
Tables can be used as a source or target in a dataflow.  DESCRIPTION refers to the data content in table.  The example describes the database table that will house CUSTOMER data.


Source : SAP
keep it simple and Sweet - Naming Convention for quality Development

A strict naming schema for all DS objects (projects, jobs, workflows, dataflows, datastores, file formats, custom functions) is essential when working in a multi-user environment. The central repository has no folder concept or hierarchy or grouping functionality. The only way to distinguish between objects from one grouping and another one is by name. Most effective approach for naming objects is based on prefixing.
Note: In order to display the full names of DS objects in the Designer workspace, increase the icon name length. You do this by selecting Tools –> Options from the Designer menu, then expand Designer and select General. In this window, specify the value of 64 in the “Number of characters in workspace icon name” box.
/wp-content/uploads/2014/01/1_363033.png
General note: Versioning should not be handled by naming conventions. So, never include a version number in an object name. Use the central repository concept for maintaining successive versions of any object.

1.  Reusable objects

ObjectNaming ConventionExample
Project<project_name>BI4B
Job<project_name>_<job_name>BI4B_D
Workflow contained in one job only
<project_name>_<job_name>_[XT|TF|LD|AG…]
<project_name>_<job_name>_[XT|TF|LD|AG…]_<workflow_name>
<project_name>_<job_name>_<workflow_name>
BI4B_D_XT
BI4B_D_LD_Facts
BI4B_D_Init
Workflow that is reused
<project_name>_COMMN_[XT|TF|LD|AG…]_<workflow_name>
COMMN_[XT|TF|LD|AG…]_<workflow_name>
BI4B_COMMN_Init
COMMN_ErrorHandling
Dataflow contained in one job only<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>BI4B_D_LD_Opportunities
Dataflow that is reused<project_name>_COMMN_[XT|TF|LD|AG…]_<dataflow_name>
BI4B_COMMN_LD_JobCycles
COMMN_LD_Jobs
Embedded Dataflow<project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMBBI4B_D_LD_Employees_EMB
ABAP Dataflow<project_name>_<job_name>_XT_<dataflow_name>_ABAPBI4B_D_XT_KNA1_ABAP
Custom Function contained in one job only<project_name>_<function_name>BI4B_getDate
Custom Function that is reusedCOMMN_<function_name>COMMN_dateKey

1.1. Projects: <project_name>

Give every DS project a 5-character short name. The name has to be short, because it will be used as a prefix for the name of all reusable objects defined within the project.
E.g.: P2345

1.2. Jobs: <project_name>_<job_name>

Give every job a 5-character short name. Use < project name>_ as a prefix for the job name. The name has to be short, because it will be used as a prefix for the name of all workflows and dataflows defined within that job.
E.g.: P2345_J2345

1.3. Workflows: <project_name>_<job_name>_[XT|TF|LD|AG…][_<workflow_name>]

Name every workflow with <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared workflows, used across projects, <project_name>_COMMN_ when used in multiple jobs within a given project.
Workflows are often used to group dataflows for serial or parallel execution. In a typical ETL job, dataflows are executed in “stages”: a first set of dataflows have be executed (in parallel) before a next set can be started; and so on. A data warehouse loading job may extract data from the sources, load them into staging, optionally transform from staging-in to staging-out before loading into the core EDW and aggregating into the semantic layer.
Distinguish between job stages by extending the prefix with a 2 character code:
  • XT: extract from source to staging
  • TF: transform from staging-in to staging-out
  • LD: load from staging into the core EDW layer
  • AG: load (physically aggregate) from core to semantic layer
The workflow name will be used as a prefix for the name of all embedded workflows and dataflows.
E.g.: P2345_J2345_XT
Within a workflow, objects (scripts, sub-workflows, dataflows) must either all be defined in parallel or all sequentially, and will be executed as such. There is no limit to the number of objects within a workflow. When the number of objects is higher than the number of processors available, DS will internally control the execution order of embedded parallel objects. Only when there are fewer objects than the number of processors available, they will really be executed in parallel.
Complex hierarchical structures can be defined by nesting workflows. There is no limit to the number of nesting levels, either. With nested workflows, use a name (_Facts for facts extraction or load, _Dims for dimension processing…) combined with an outline numbering scheme (1, 11, 111, 112, 12, 2…).
E.g.: P2345_J2345_LD_Dims21
Some workflows may not contain dataflows at all; they only contain not reusable objects. In that case, just name the workflow according to its function.
E.g. for a workflow embedding an initialization script: P2345_J2345_Initialise

1.4. Dataflows

DS supports three types of dataflows. The dataflow names must be unique across the different types. To distinguish the embedded and ABAP dataflows from the regular ones, use a suffix in their name.
  • Regular dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>
According to design and development best practices there should only be a single target table in a dataflow. Name a dataflow according to that target table.
Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared dataflows, used across projects, <project_name>_COMMN_ when used in multiple  jobs within a given project. Distinguish between dataflow locations (extract, transform, load, aggregate…) by extending the prefix with a 3 character code (XT_, TF_, LD_, AG_…) as from the embedding workflow.
E.g.: P2345_J2345_XT_S_TABLE1, P2345_J2345_LD_TargetTable
  • Embedded dataflows: <project_name>_<job_name>_[XT|TF|LD|AG…]_<dataflow_name>_EMB
Name every embedded dataflow with <project_name>_<job name>_ as a prefix; use _EMB as a suffix for the dataflow name. Distinguish between dataflow locations (extract, transform, load and aggregate) by extending the prefix with a 3 character code (XT_, TF_, LD_ and AG_).
E.g.: P2345_J2345_LD_TargetTable_EMB
  
  • ABAP dataflows: <project_name>_<job_name>_XT_<dataflow_name>_ABAP
An ABAP dataflow is always used as a source in a regular dataflow. Reuse that name for the  ABAP dataflow and add _ABAP as a suffix to make it unique.
E.g.: P2345_J2345_XT_S_TABLE1_ABAP

1.5. Custom Functions: <project_name>_<function_name>

Give every Custom Function a descriptive name. Use <project_name>_<job name>_ as a prefix. Use COMMN_ as prefix for shared custom functions, used across projects.
E.g.: P2345_TrimBlanksExt

2.  Datastores: [SAP|BWS|BWT|HANA…]_<datastore_name>

As datastores are often used in multiple projects, they do not follow the same naming conventions as for other reusable projects.
Name a datastore in line with its physical name, and make the prefix depend on the object’s type:
Datastore TypeDatabase TypeNaming ConventionExample
DatabaseSQL ServerSQL_SQL_OC4A1
DatabaseOracleORA_ORA_ITSD
DatabaseTeradataTD_TD_Staging
DatabaseDB2DB2_DB2_MDM
DatabaseMySQLMySQL_MySQL_GEB
DatabaseSybase ASEASE_ASE_CRN
DatabaseSybase IQIQ_IQ_CMDWH
SAP ApplicationsSAP_SAP_MDR
SAP BW as a sourceBWS_BWS_Achme
BW as a targetBWT_BWT_Hana
SAP HanaHANA_HANA_DB
AdapterAS_AS_Nexus
Web ServicesWS_WS_Weather
Note 1: Pay attention when choosing datastore names. A datastore name cannot be changed anymore once the object has been created.
Note 2: Landscape-related information should not be handled with datastore names. So, never include a physical system indicator (DEV, T, QA…) in a datastore name. Landscape information should be configured using datastore configurations. Create one datastore, then create a datastore configuration for every tier (development, test, QA, production…) in the landscape.

3.  File formats: <project_name>_<file_format_name>

Reuse the file name for the format name of a project-specific file. Use < project name>_ as a prefix.
E.g.: P2345_ISO_Cntr_Codes
Note: Pay attention when choosing a file format names. A file format name cannot be changed anymore once the object has been created.

4.  Not reusable objects

Because not reusable objects are only defined within the context of a workflow or a dataflow, no strict naming standards are necessary. Names will only serve documentation purposes.
Use meaningful names for workflow objects (Script, Condition, While Loop, Try, Catch).
Do not change the transform names unless you want to stress the specific purpose of a Query transform, e.g. Join, Order, OuterJoin… 


1 comment: