Which Subscription data for Lifetime Analytics?

Purpose

In this article, the following terms define:

“Operator” : the client Telecom operator using Lifetime Analytics (LTA)
“Subscriber” or “Customer” : the customer of the Telecom Operator

The Lifetime Analytics application uses on a monthly pictures of data of the Operator's subscriptions to

detect and analyze churn, up-sell and cross-sell clusters
design action plan on selection of subscriptions
monitor the impacts of the actions onto the churn rate, the ARPU and others business metrics

Lifetime Analytics does NOT contact or identify directly the subscribers. The execution of the actions designed with Lifetime Analytics must be performed by the Operator through its own CRM or campaign management. So, Lifetime Analytics does not expect any contact information from the Operator (See Data Privacy Management).

Each monthly snapshot is a complete picture at the end of the month (“period”) of the customer’s data, including the active subscriptions (BoP), the growth adds and the churners (customer and operator churn) for all the product families, for residential, SME and corporate customers. The migrations (in and out) are computed by LTA from the variation of product family and marketing product over the months.

The main categories of expected subscription data :

Essential information about the subscription contract
Billed revenue
Billing and missed payment
CEX / Customer survey
Contract Changes
Churn & disconnection reasons from CRM
Commitment
Complaints / Truck rolls
Touchpoints / contacts
Distribution channels
Fixed Broadband access
Fixed Broadband allowance
Fixed Broadband device characteristics
Fixed Broadband – usage of 3rd Party SVOD pay TV
Fixed broadband - Data usage
Fixed broadband quality / incidents
Fixed broadband quality / optical
Fixed broadband quality / wifi
Fixed telephony quality / incidents
Fixed telephony – voice usage – inbound
Fixed telephony – voice usage – outbound
Fixed voice allowance
Loyalty program
Market data / competitor offers
Mobile Handset device characteristics - IMEI / TAC code
Mobile Postpaid - Mobile data allowance
Mobile Postpaid - SMS / MMS allowance
Mobile Postpaid - Voice allowance - Outbound
Mobile access
Mobile network access
Mobile network availability
Mobile telephony – mobile data usage
Mobile telephony – voice usage – inbound
Mobile telephony – voice usage – outbound
Pay TV access
Pay TV allowance - Channel
Pay TV allowance - Viewing option
Pay TV device characteristics
Pay TV quality / incidents
Pay TV – Device and connections
Pay TV – usage of Operator SVOD pay TV
Pay TV – usage of TVOD pay TV
Pay TV – usage of catch-up pay TV
Pay TV – usage of linear pay TV

Additionally, Lifetime Analytics computes automatically more than 500 data from the provided to enrich the data model. Lifetime Analytics can enrich the Mobile Handset IMEI / TAC code with 200 device characteristics to offer a deep understanding of the used mobile handsets.

All the data are not mandatory. Lifetime Analytics defines a data model with priority levels from Mandatory, P1, P2, P3 to ensure a progressive fulfillment with the available data.

The Data Model is defined directly into the application and can be configured by the users to customize or extend the data model for the own usage of the Operator. (See Data Reference)

Definition of Data

The complete list of expected and computed data is available in Lifetime Data Reference catalog (Menu Data Management > Data Reference > Export).

The Data Reference file is an Excel file containing the description of each data :

CATEGORY : the category of the data (eg billing, ...)
PRODUCT_SCOPE : the related RGUs separated by a comma (,) or ALL if it applies to any RGU. It must be aligned with the RGU Reference
IS_COMPUTED : if the data is computed by Lifetime Analytics (yes) or expected from the Operator (no)
FEATURE : name of the data field, using an underscore notation (eg my_own_data_field)
LEVEL : the attachment level of the data (customer, product or technical)
DEFINITION : definition of the expected or computed data
DTYPE : format of the data from this list
- text : free text or category name
- boolean : yes/no, true/false, 0/1
- currency : numeric representing a monetary data
- number : generic number (eg. number of minutes voice)
- percent : percentage
- date : ISO format date YYYY-MM-DD
- year : number as a year YYYY
- gender : M/F
- zip code
MANDATORY : if YES it is required, P1...P3 defines different levels of priority. NO means informative or mandatory but only in specific case, see definition.
IS_DIMENSION : if the data is used by Lifetime Analytics as a dimension of the analysis of the churn dynamics dashboard
IS_FACTOR : if the data is used by Lifetime Analytics as a potential churn/up-sell/cross-sell factor of the analysis of the cluster discovery (See What is a Cluster?).

For a POC, only the Mandatory (P1 and YES at least, some P2 may be very interesting if available) and NOT COMPUTED (IS_COMPUTED = No) data are expected.

The expected Subscriber data are at least for the last 12 months at the technical product level (RGU) with an initial load of historical data of 36 months on a restricted scope (only Mandatory data - see below).

Data collection

To fill the Lifetime Analytics application with the Subscription Data, the Operator monthly collects from its own BSS and OSS systems the data to produces monthly data files, containing the expected data as records of technical product data organized with a flat hierarchy from the Customer ID to the Technical Product ID.

Subscription Hierarchy : Customer ID - Contract ID - Marketing Product ID - Technical Product ID

The Customer ID is the unique ID of the subscriber (person, professional or company).

The Marketing Product ID is the commercial offer signed by the Customer ID that it can be cancelled. The paid price is attached to the Marketing Product ID. It is the corner stone of Lifetime Analytics, named Subscription ID in the application.

The Contract ID is eventually a set of Marketing Product ID as a bundle.

The Technical Product ID is the service as part of the Marketing Product ID. The usage is attached to the Technical Product ID.

Note : the data format expects to have a flatten hierarchical format. It means one single row per Technical Product ID, replicating the fields from the Marketing Product and Customer level.

A single Technical Product ID can describe the usage of multiple different Revenue Generative Units (RGU). Eg a 3P PTV+FV+MV Marketing Product can be described as a single Technical Product ID or multiple Technical Product IDs.

If the Marketing Product ID contains several RGU of the same type (eg 2 SIM cards), it is expected to have distinct Technical Product IDs for each similar RGU.

State of the Marketing Product ID

Each Marketing Product ID has a state code, defining the status of the subscription :

churn : the subscription was churned during the month
gross-adds : the subscription was originated during the month
BoP: the subscription was previously existing before the month

Note : it is required to use the same state codes.

Churn & deactivation origins

As Lifetime Analytics focus on the churn issued by the end customers, when the subscription state is set to "churn" it supposes the churn is initiated by the end customer except if the data "Termination requested by customer" is set to False.

Mobile handset

Based upon the IMEI/TAC code, Lifetime Analytics is able to provide all the expected Mobile Handset data through a 3rd database provider, if it is necessary.

Technical File format

The Lifetime's application expects to have one or data files (splits) per month (snapshot), reflecting at the end of month the whole situation of all the subscriptions.

Lifetime uses a flatten data model of the subscription hierarchy, as one single data table, at the granularity level of the Technical Product ID. Each data related to Marketing Product ID and Customer ID are replicated at the level of the Technical Product ID.

Lifetime Analytics allows to import the churn data, as defined previously, in different formats:

CSV format (Gzip, Zip compression)
PARQUET (Snappy compression)
NDJSON (Gzip compression)

CSV format

For CSV, the expectations are:

Comma (,) separator
Field names (as strictly defined into the data references by the feature attribute) are using “snake case” convention (underscore as a separator)
First line (header line) is containing the field name case sensitive names (as strictly defined into the data references by the feature attribute)
Decimal number represented ##.## with a point as decimal separator without no ‘000 separator
Date following the ISO 8601 norm: YYYY-MM-DD
Double quoting for extended text field (including reserved character)
Boolean type are encoded as follow: True/False or Yes/No or 0/1
Null values are represented by empty/blank values

The order of the column does not matter, as the header line has the field names.

Comment: the CSV format can be pre-checked using Python Pandas Read CSV functions.[3]

[3] https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

PARQUET format

For PARQUET (preferred format for size and performance), the expectations are :

Binary file following the Apache Arrow specification[1]

Comment: the PARQUET format can be pre-checked using Python Pandas Read PARQUET functions.[2]

[1] https://arrow.apache.org/

[2] https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html

JSON / NDJSON format

For NDJSON, the expectations are:

Each row is a valid JSON record, separated by a line return (CRLR)
Field names (as strictly defined into the data references by the feature attribute) are using “snake case” convention (underscore as a separator)
Decimal number represented ##.## with a point as decimal separator without no ‘000 separator
Date following the ISO 8601 norm: YYYY-MM-DD
Null value makes the field optional
Each field are at the same level, no nested object or array

Comment: the NDJSON format can be pre-checked using Python Pandas Read JSON functions.[4]

[4] https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

File transfer

The import can be performed with different smaller files (preferred method) or one single file (max 5GB) using the same file format and a consistent file pattern (glob pattern). Ex 2021-03-31.part1.csv.gz

The submission process can be handled:

Manually with the Application UI (max 5GB) with the Lifetime Analytics application (See How to import Subscription Data),
Automatically through the LTA SDK/CLI (See Automation)

For Data Privacy considerations, please see Data Privacy Management