Past data stories
Contents
Past data stories#
What do Amazon customers think of Alexa?#
Project Summary
Context
When Amazon customers purchase an Amazon Alexa, a voice controlled-assistant, some of them may chose to leave a review and give the product a rating. The rating ranges from 1 star to 5 stars. A sample of 3,150 customer reviews written between May - July 2018 (data source here) were selected for analysis.
Problem
Help Amazon solve the following challenges:
It’s been discovered that all verified_reviews that are only say “love it” or are only one word are actually fake. Reviews like “great!” or “Love It!” need to be removed from the dataset.
We need to see the number of customers that left a review each day from the sample reviews that we do have
We need to see the number of reviews classified by rating
Can we anticipate sentiment of Amazon Alexa reviews given the data at hand?
Solution
Python code leveraging the Pandas library found reviews that had two whitespaces and removed them
A lot of the reviews in the sample dataset happened towards the tail end of July 2018
A lot of the reviews were 5-star reviews, indicating an overwhelming sense of positivity in the reviews
Yes we can, leveraging the CatBoost classifier, created by Yandex (which is now open source)
Resources
My project notebook can be found here
Who is the perfect Marketing Researcher?#
Project Summary
Context
Data is disrupting professions everywhere, and academia is fighting to keep up (or is it already too late?) resulting in many job-seekers seeking to plug into research oriented roles that can make their professions better. Marketers such as myself seeking to work in the niche area of marketing research need to gain a clear and current understanding of how they should package themselves to become attractive to the “global” employer. Sample data obtained from Indeed.com (for the US market) in January 2020 was obtained to answer:
What skills and competencies are needed in Marketing Researchers?
What are the general characteristics of employers seeking this skillset?
Problem
The problem can be broken down as follows:
No job-seeker can read all job postings at a given time at once
Key-words need to be identified that the job-seeker can take note of
Skills and competencies needed in the role needed to be identified by the job-seeker so that they know what areas they would have to upskill in
The general characteristics of employers seeking this skillset need to be identified
Solution
With Python, the data available at a given point in time was mined and analyzed, allowing the job seeker to “read” 900+ job postings at once
Nouns tended to give the most complete information compared to verbs or pronouns when used in n-grams (
n
number of words analyzed as a phrase).Skills and competencies emphasised by employers were:
At least 3 years of working experience
Soft skills particularly the ability to manage, communicate, prioritize and multi-task all at once
Prior experience in Marketing Research
Prior experience in hands-on Marketing
Most of the frequent expressions discussed the candidate characteristics, not the employer characteristics
Resources
The project notebook can be found here and the project report can be found here
How can we assess the welfare of the poor when they borrow?#
Project Summary
Context
Kiva.org is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. Kiva lenders have provided over $1 billion US dollars in loans to over 2 million people. In order to set investment priorities, help inform lenders, and understand their target communities, knowing the level of poverty of each borrower is critical.
Problem
The problem can be broken down as follows:
What is the welfare of Kiva borrowers, relative to their purchasing power?
In which communities are Kiva having the most impact?
For the locations in which Kiva has active loans, the objective is to pair Kiva’s data with additional data sources to estimate the welfare level of borrowers in specific regions, based on shared economic and demographic characteristics.
Solution
The welfare of KIVA borrowers overall is varied
Few borrowers are taking loans for personal use, and that’s a good thing
A good number of the most valuable loans (up to 50,000 US dollars) are taken up by social impact startups, which are in turn working with local communities; this makes impact more indirect
countries that have many more people in need like Burkina Faso, Sierra Leone and Mali have a high debt burden, trapping borrowers from these countries in poverty even when they borrow small amounts
Massive impact is being achieved for Kenyan men and Filipino women, because the loans are small enough to pay back and monthly repayments are taking up only 50% or less of their monthly income
Resources
The project report can be found here
How can we learn more about Kenyans?#
Project Summary
Context
Kenya is a linguistically diverse country with over 40+ languages, but current NLP solutions serving her inhabitants do not capture this linguistic diversity. Furthermore, many Kenyans code-switch, particularly in informal settings and this requires a fresh approach to conducting NLP tasks on data generated by Kenyans.
Problem
There is a need to build a Natural Language Processing solution for multilingual, code-switching cultures.
Solution
This NLP solution can be built using data from Kenyans that
are active on the Twitter platform and
unite under the
#KOT
hashtag.
Resources
My solution was presented in the form of a talk (which you can watch below), a Jupyter notebook and slides
Listen to the project abstract below:
I also discussed the NLP toolbox which was used to create this solution at Lanfrica: