Cognizant Artificial Intelligence Virtual Experience

I recently participated in the Cognizant Virtual Experience program, where I took on the role of a data scientist. My task was to assist "Gala Groceries," a hypothetical technology-led grocery store chain in the USA, in resolving their supply chain issues. This project allowed me to apply my skills in data analysis, machine learning, and business communication to a practical scenario.

Exploratory Data Analysis

My first step was to understand the data provided by Gala Groceries. Using Python, I explored multiple datasets including sales transactions, customer demographics, and sensor data. Here's what I did:

1. Data Loading and Analysis: I used pandas to load and examine the structure of each dataset. This revealed the structure of our transaction data, including the number of non-null values and data types for each column.

2. Data Quality Check: I checked for missing values, inconsistent formats, and duplicates across all datasets.

3. Statistical Analysis: I performed basic statistical analysis to understand the distribution of key variables. For instance, I examined the distribution of order statuses, brands, and product lines.

Data Cleaning and Pre-processing

After understanding the data, I moved on to cleaning and pre-processing:

1. Date Formatting: I converted date columns to the appropriate datetime format.

2. Handling Missing Values: I addressed missing values in various columns, either by filling them with appropriate values or removing them when necessary.

3. Encoding Categorical Variables: I encoded categorical variables like gender, deceased_indicator, and owns_car into numerical formats for machine learning compatibility.

Feature Engineering

To enrich our analysis, I created new features:

1. RFM Analysis: I calculated customer recency, frequency, and monetary (RFM) values.

2. Age Calculation: I derived customer age from their date of birth.

3. Data Integration: I merged different datasets to create a comprehensive customer profile.

Data Visualisation

Next, I used matplotlib and seaborn to create various visualisations:

1. Age Distribution: I compared age distributions between old and new customers.

2. Gender Analysis: I analysed gender distribution and its relation to purchasing behaviour.

3. Geographical Insights: I examined the impact of factors such as state on customer transactions.

4. Industry Category Analysis: I investigated job industry categories and their prevalence among customers.

Predictive Modelling

Finally, I built a Random Forest Regressor model to predict stock levels:

1. Data Preparation: I prepared the data by splitting it into features (X) and target variable (y).

2. Cross-Validation: I used cross-validation to ensure robust model performance.

3. Model Training and Evaluation: I trained the model and evaluated it using Mean Absolute Error (MAE).

Using the trained model, I found that:

The temperature and unit price were important in predicting stock
The hour of day was also important for predicting stock

Conclusion

This experience with the Cognizant AI Virtual Experience improved my skills in data cleaning, integration, analysis, and visualisation using Python. I'm excited to apply these skills to future projects and continue improving my skills!

You can find my Python code here.