jnyh

for more interesting reads https://medium.com/@jnyh | for more interesting videos https://www.cinnamon.video/jnyh

Cartoon characters, movies, TV shows, comics, video games and more — can you guess what the colorful blocks depict?

Minimalist Block Characters is a free and unique trivia game that will tickle your brain!

Your favorite cartoon characters, movies, stars, comics, video games — and much more — have been turned into colorful blocks.

Can you name all the movies and TV shows in these blocks?

With nothing but the colorful blocks as clues, you will have to know the characters pretty well to name them all. The aim is to identify all 20 of the famous films and television series from the colored blocks. Treat each block as a character.

The colors in those blocks are typical of what that character wears in the show. Here is a clue:

Here is one of the answers:

4 — winnie the pooh

Can you guess the characters?

Here's a clue for #22:

You can view all the questions and answers in my original post:

https://medium.com/@jnyh/blocks-quiz-7b94a7a05606

Next 24 characters can be found in Part II.

Cartoon characters, movies, TV shows, comics, video games and more — can you guess what the colorful blocks depict?

Blocks Quiz is a free and unique trivia game that will tickle your brain!

Your favorite cartoon characters, movies, stars, comics, video games — and much more — have been turned into colorful blocks.

The aim is to identify all of the famous films and television series from the colored blocks. Treat each block as a character. The colors in those blocks are typical of what that character wears in the show. With nothing but the colorful blocks as clues, you will have to know the characters pretty well to name them all.

Can you name all the movies and TV shows in these blocks?

With nothing but the colorful squares as clues, you will have to know the characters pretty well to name them all. The aim is to identify all 20 of the famous films and television series from the colored blocks. Treat each block as a character. The colors in those blocks are typical of what that character wears in the show.

Here are some clues:

Put them all together and you should have the cast of one of your favourite TV shows or films.

You can view all the questions and answers in my original post:

https://medium.com/@jnyh/blocks-quiz-7b94a7a05606

-———-

Round 2

Ready for another round? Let's go!

Need a hint? (17)

My answers are collected from some friends and fans.

Perhaps you could let me know if you think likewise or disagree?

You can view all the questions and answers in my original post:

https://medium.com/@jnyh/blocks-quiz-7b94a7a05606

That’s all folks!

4 Lessons on Personal and Team Effectiveness

This corporate version of the familiar childhood story has 4 episodes.

Episode 1

Once upon a time a turtle and a rabbit had an argument about who was faster. They decided to settle the argument with a race. The rabbit was complacent and fell asleep, and the turtle won the race.

What is the moral of the story?

“Slow-and-steady wins the race”

Episode 2

The rabbit was disappointed at losing the race and he did some thinking. He thought he should not lose in a race like this.

He challenged the turtle to another race. The turtle agreed. This time the rabbit remained focused and finished the race way ahead of the turtle.

What is the moral of the story?

“Fast-and-consistent will beat the slow-and-steady”

Episode 3

The turtle did some thinking this time, and realized that there’s no way he can beat the rabbit in a race like this.

He thought for a while, and then challenged the rabbit to another race. But this time, the turtle gets to design the race route. The rabbit sped off just as focused as the last race but arrived at a broad river. The turtle arrived later, swam across the river and reached the finishing line.

What is the moral of the story?

“First identify your strengths and then change the playing field to suit your core competence”

Episode 4

The rabbit and the turtle, by this time, had become pretty good friends and they did some thinking together. Both realized that the last race could have been run much better.

So they decided to run the race again, but this time as a team. The rabbit carried the turtle for the first part of the route, then the turtle swam across the river with the rabbit on its back, and finally finished the race together.

What is the moral of the story?

“The team is stronger than the sum of its components; only working in a team can harness each other’s core competencies”

Summary

•Compete against the situation — not against a rival

•Never give up when faced with failure, do some self-reflection

•Work to your competencies, turn the playing field to your advantage

•Pooling resources and working as a team will beat individual performers

Agile principles and priorities can be applied to the way you write

Project management methodologies are commonly used to get projects done or a product produced. They are, in general, processes and frameworks which break down the overall objective to individual tasks organized on a timeline. This can be adapted to guide the way you write.

In the past, the traditional Waterfall methodology (dated way back to 1970) has been very popular. It defines all requirements and parameters of the product at the start, so that the project team can work towards this target in sequential phases. This method has been very suitable for the manufacturing industry where product specifications seldom vary with time. It requires very extensive upfront planning, and ideally, the output product is exactly the same as specified in the beginning.

When the Waterfall methodology is becoming unsuitable, many popular project management methodologies have emerged over the years, especially in the software development industry. Let me share the most popular one.

Waterfall Methodology vs Agile Methodology. Figure by the author.

Agile Framework

Agile is a way of working developed in 2001, and is a widely used to manage software development projects. It is suitable for fast-paced development cycles and has provision for specifications change during the design and build process. It is flexible, and strives for iterative incremental improvement in the product through team collaboration. In short, Agile is to plan, build, test, learn, repeat.

Agile teams are responsive to the unpredictable requirements (as the project unfolds) through iterative work processes. Below are Agile principles which serve as guidelines to the team’s way of working:

· Customer satisfaction through early and continuous software delivery

· Accommodate changing requirements throughout the development process

· Frequent delivery of working software, as the working software is the primary measure of progress

· Collaboration and interaction between the business stakeholders (client) and developers (vendor) throughout the project, including face-to-face communication within the development team

· Support, trust, and motivate the people involved

· Agile frameworks to support a consistent development pace

· Attention to technical detail and design enhances agility

· Simplicity in looking for solutions

· Regular reflections in the self-organizing team on how to become more effective

Agile projects are characterized by a series of tasks that are conceived, executed and adapted as the situation demands. However, Agile focus is not on what to do, but how to think. Agile values and places priority on:

· Individuals and interactions (rather than processes and tools)

· Working software (rather than comprehensive documentation)

· Customer collaboration (rather than contract negotiation)

· Responding to change (rather than following a predefined rigid plan)

Agile way of working. Figure by the author.

Agile principles and priorities are employed for greater productivity, and most of them could also be leveraged for writing articles or fiction stories. If you are looking for a guide, you could try Agile and adapt it to shape your writing works. I will explain the main Agile working practices (Scrum framework), and how they can be applied to writing:

1. Define the business need and the project objective. This is usually driven by the product owner who is responsible for the product features and quality. It is the big picture stuff, but this is the core belief that you will refer back to as you build. In writing, the product owner could be the client, the publication, or the target audience (for example, Medium readers). Understand what the product owner wants to read and tailor the article or fiction story to meet their needs.

2. Build the backlog. Focusing on the user requirements (“user stories” in Agile), a list of tasks is derived to build the product features or to improve the product performance. The development team builds the backlog together with the product owner to determine the feature objectives and performance targets. In writing, the development team refers to the author, co-author, proofreader and editor. The backlog could be a list of topics to be covered in the article, or scenes to flow in the fiction story.

3. Prioritize the backlog, identify the backlog tasks which will bring the most value with minimal effort. Not every topic or scene is worth writing, so cover the main ones first. When the main ones are conveyed, you might find that the remaining others are not as important as initially thought.

4. Do a sprint (the actual development work). Sprints are usually two-weeks cycles where high priority tasks on the backlog are worked on. In writing, each sprint could be several writing sessions spanning a couple of days. Plan the outline or scope to be covered. During the sprint, pick a prioritized topic or scene and let the thoughts flow in your writing.

5. Have daily standups. Standup meetings are for team members to be accountable to one another on their progress in the current sprint. Each team member take turns for status reporting — what was done the day before, what to do today, any potential obstacles. If you are the only author, you can still have a daily standup on your own to evaluate your writing progress. You can reflect on the previous day’s work and plan for the current day.

6. Review the sprint output (sprint retrospective meeting). At the end of two weeks, there should be a functional output for the project team to demonstrate, with an incremental improvement in the product. Get feedback from client stakeholders and prepare for the next sprint. As the author, you are familiar with what you are trying to convey. When you have the client or a peer (editor or proofreader) read the draft, you become aware of what is actually received. Regular feedback is a key principle for the Agile way of iterative incremental improvement.

7. Prepare for the next sprint. Identify the tasks that are going on well to keep doing it, and identify those that are impediments to be removed. Once again, build and prioritize the backlog so that the next sprint can be carried out, to work on the next improvement areas.

8. Roll out the final product. When all stakeholders agree that no more improvement is needed in the product, it is ready to be deployed. This is when you get so tired of running through your article or fiction story and you never want to look at it again.

Challenges with the client

Besides having adequate communication between the development team and the client, the client’s expectations have to be managed. All clients generally love the idea that Agile is flexible, and that it grants them more opportunities to change their mind as the project develops. However, they might not realize that such flexibility is also costly in both time and money. Here are some things you should do:

(i) Get the client to understand that flexibility is inevitably expensive. For example, a flexible full-fare economy ticket which allows itinerary change will cost much more than the fixed one. Making changes in decision also means that the client is paying for past wasted time and effort.

(ii) Set the client’s expectation to commit time for frequent sprint retrospective meetings (e.g. every two weeks) to evaluate the completed sprints. On top of that, the client representative in each meeting needs to be (empowered by higher management) able to make decisions on product specifications. For Agile to work, the client needs to provide continuous feedback and priority setting to keep the project moving.

(iii) Earn the client’s trust that each iteration is done with the best possible efforts to deliver value and improve the product. While holding the decision making power, the client also expects an iteration to have tremendous improvement. Such imbalance in responsibility in the client-vendor relationship should be converted to mutual trust and willingness to experiment together. Agile’s principle in collaboration means it is a team effort in both making decisions and delivering value.

Minimum viable product

One key feature of the Agile way of working is the development of a minimum viable product (MVP), which is the most fundamental configuration of the product. After the project objectives have been defined, a proposal is made on the approach to the problem including the MVP to be built within the shortest possible time (e.g. one month). The MVP has only the most important features, but its performance may not be the most optimal.

This seems very risky putting a less-than-finished version for the client to test, so the team (including the client) has to be prepared for it. The purpose is to make the MVP work, test it, and see if it is really in the correct direction of solving the problem thus helping the business case. The MVP will grow better, because the development team is going to use what they have learnt from the MVP feedback to build an improved version. Agile is about continuously deploying and learning from your mistakes, and working with the client to make the product better.

Agile is to plan, build, test, learn, repeat

In writing, the MVP can be regarded as the quick first draft of skeletal flow with key points. Considering the writing objective required by the product owner, an ambitious proposal is great, but build a MVP first to get instant feedback. The idea is, once you have a topic in mind and some ideas on how to approach the topic using what you know, write something that is the “bare-minimum” first. Then let people vet it and get feedback to know whether you are on the right track with the right content.

Summary

Being “agile” (flexible) means you need to adopt a dynamic approach in planning and be adaptable to the changing needs of the new situation when it arises. The Agile environment appeals to quick action, fail quickly, evaluate and learn, then try again using a different approach or an improved method. It works great in dynamic environments where there is a potential for changing or evolving requirements.

All the best to your writing projects!

Linear, Ridge, LASSO, Polynomial Regression

Python codes are available on my GitHub.

I was exploring the dengue trend in Singapore when there has been a recent spike in dengue cases, especially in the Dengue Red Zone where I am living. However, the raw data was not available on the NEA website.

I was wondering, has dengue affected the life expectancy of people in any country?

Do people in rich nations live longer?

What are the factors affecting life expectancy of a country?

So I explored life expectancy and looked for data on the following aspects (features):

· Birth Rate

· Cancer Rate

· Dengue Cases

· Environmental Performance Index (EPI)

· Gross Domestic Product (GDP)

· Health Expenditure

· Heart Disease Rate

· Population

· Area

· Population Density

· Stroke Rate

Target is Life Expectancy, measured in number of years.

The assumptions are:

1. These are country level average

2. There is no distinction between male and female

Data Science Process

The following data science process has been employed:

· data collection, data cleaning, Exploratory Data Analysis

· feature selection, feature engineering

· model selection, model tuning and hyperparameter tuning

· model optimisation based on selected performance metric

Tools used for this analysis include:

· Python libraries particularly Numpy and Pandas for manipulating data structures

· Matplotlib and Seaborn for visualisation

· Scikit-Learn and Statsmodels for regression analysis

Exploratory Data Analysis

First I check for multi-collinearity between features.

sns.set(rc={‘figure.figsize’:(10,7)})

sns.heatmap(df.corr(), cmap=”seismic”, annot=True, vmin=-1, vmax=1)

There seem to be some strong collinearity, denoted by boxes in dark red and dark blue as you can see in the image below.

For example, countries who spent more on health expenditure are having higher EPI score; when health expenditure is higher, stroke rate is also lower; a larger area yields a higher population.

How about the correlation between features and target?

To live a long life, you should have low stroke rate, high health expenditure, take good care of the environment, and have less babies (according to the correlation chart below).

Let’s look at the initial pair plot.

sns.pairplot(df, height=1.5, aspect=1.5)

There seems to be a need to remove outliers in many features, eg. Dengue Cases, GDP, Population, Area, and Population Density.

Each outlier is replaced by the next highest value in the column. After removing the outliers, the plots are still skewed to the right (points are very concentrated on the left side). So this suggests that some transformation might be needed.

Another way to remove outliers is to use the LOG function, which helps to spread the concentrated data to the right.

Feature Selection

To look for significant features, I dropped one feature at a time to see its impact on the simple regression model. Looking at the R² Score, these 3 features (Birth Rate, EPI, Stroke Rate) are chosen, because the model will be adversely affected without them.

Next, I removed outliers and review the p-values on Statsmodels. I gained one more significant feature (Population Density). When the p-value of a feature is less than 0.05, it is considered a good feature, as I have chosen 5% as the significance level.

After that, I apply LOG function to all features, and gained 3 more significant features (GDP, Heart Disease Rate, Population, Area).

I have also done other transformations (eg. Reciprocal, Power 2, Square Root) but there is no more improvement.

Features can also be selected using the LassoCV feature in SkLearn.

Finally I looked at the pair plot again with all significant features. The scatter plots are now nicely spread out with some clear trends.

Model Selection

I am now ready to fit the following models on the train data set:

· Linear Regression (a straight line which approximates the relationship between the dependent variables and the independent target variable)

· Ridge Regression (this reduces model complexity while keeping all coefficients in the model, known as L2 penalty)

· LASSO Regression (Least Absolute Shrinkage and Selection Operator reduces model complexity by penalising model coefficients to zero, ie, L1 penalty)

· Degree 2 Polynomial Regression (a curve line to approximate the relationship between the dependent variables and the independent target variable)

I have also validated their performance on the validation data set. The simple linear regression model seems to have the potential to be the best performing model.

This is confirmed by Cross Validation using KFold (with 5 splits).

Finally, I checked the residue error against assumptions. The residue errors should be normally distribution with equal variance around the mean zero. The Normal Quartile-to-Quartile plot also looks acceptably normal.

Since I only have 250 rows (data limited by the number of countries in the world), I used the entire data set to simulate the test data set (note: this is done for academic purpose, not practical as it will lead to data leakage). I used KFold Cross Validation with 10 splits to evaluate the model performance.

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

kf = KFold(n_splits=5, shuffle=True, random_state = 1)

lm = LinearRegression()

lm.fit(X_train, y_train)

cvs_lm = cross_val_score(lm, X, y, cv=kf, scoring=’r2')

print(cvs_lm)

There is quite some variation in the R² values from 0.49 to 0.82, but the average result is around 0.69 which is quite satisfactory.

Model Interpretation

How do we interpret the model?

df = pd.read_csv(‘df3.csv’)

X = df[ [‘Birth Rate’, ‘EPI’, ‘GDP’, ‘Heart Disease Rate’, ‘Population’, ‘Area’, ‘Pop Density’, ‘Stroke Rate’] ].astype(float)

X = np.log(X)

y = df[ “Life Expectancy” ].astype(float)

X = sm.add_constant(X)

model = sm.OLS(y, X)

results = model.fit()

results.summary()

Unaffected by the features, your life expectancy is 62 years. If your country has low birth rate, add 5 more years to your life.

If the EPI (Environment Performance Index) is high, add 8 more years to your life.

If you live in a rich country, add half a year to your life. Finally for every unit (or rather LOG unit) decrease in stroke rate, 5 more years could be added to your life.

Next Steps

I could possibly collect more data by expanding the scope to cities instead of countries, and to explore other features (factors) affecting life expectancy. Also, I could split the data to male and female categories for such life expectancy regression analysis.

To conclude, here are some interesting insights:

1. Japan has the highest life expectancy (83.7 years). Central African Republic (49.5 years) and many countries in the African continent are at the bottom of scale. Singapore is ranked #5 (82.7 years).

**2. Take good care of the environment. It has the largest coefficient (impact) on the country’s life expectancy.
**

Python codes for the above analysis are available on my GitHub, do feel free to refer to them.

https://github.com/JNYH/Project-Luther

Video presentation: https://youtu.be/gC2m_lvouu8

Thank you for reading.

People commonly say that we should not bottle up feelings because it is unhealthy for our wellbeing. For me, my passion for data science is overflowing, and I am in need of an outlet.

The purpose of this article is to share my exploration journey, just like how we do Transfer Learning for machine learning problems.

“Do not waste time reinventing the wheel, learn from those who have done it before.” — JN

Writing data science articles has taught me to organise my thoughts and choose my words properly. It invokes curiosity, arouses emotions and triggers creativity amongst other expressive advantages.

My blogging journey

Around August 2019, I published a data science article on Medium.com. At that time I was quite satisfied with a peak article view by 70 users on the first day and around 20 views per day subsequently. I thought there must be a bigger data science community out there which I could reach out to.

I remember reading very good articles in the past somewhere, and started to dive into many different publications. After a month of exploration, I eventually became a writer for Towards Data Science and my first article was accepted on 05 October by the publication.

My jaw dropped when I saw the stats: 352 views in a day! That is a huge chunk of my cumulative views over the past month, accomplished in a day!

To a creator of any work, it is very rewarding to have the work seen and recognised. It is not a matter of pride or selfishness, we are psychologically programmed to need respect and affection from everyone around us. People yearn for

■ claps on Medium,

■ likes on Facebook,

■ shares on LinkedIn.

While it is comforting to be recognised, please do not rely upon it to prop up your self-esteem.

Example of another article

On 13 October, I published Locate V-beat in Electrocardiogram (ECG) using machine learning and image pre-processing.

The article was immediately featured on the homepage under the “Latest” section and yielded 372 views on that day. The reach came from internal Medium distribution, such as Medium homepage, app, emails, and social media accounts; alongside with other external references and search engines. The advantage of being part of Medium’s metered paywall is that curators can recommend my articles to interested readers.

Until today, I have published 5 articles on Towards Data Science, with 50 articles views daily on steady state (without any boosting or new articles injected). There is usually a small increase when I share or boost my articles on social media.

Tip: To boost article views, include related Hashtags which are trending, for example #customer and #healthtechnology, to be picked up by search engines.

Here comes the main course

If you are new to Medium, or would like to start to do some serious blogging for an accomplished publication, this 4-step section will show you how it can be done.

**1. Join the Medium Partner Program
**

Sign up for a Medium account and read this article for new writers. To join the Medium Partner Program, you can click on this link.

**2. Be an official writer for Towards Data Science
**

The esteemed publication has instructed in this article for aspiring writers to send an email to publication@towardsdatascience.com and include below items:

(a) a link to your article (Medium draft preferred, or published on Medium)

(b) a link to your LinkedIn profile

3. Set up a new Stripe account (https://stripe.com**)
**

Register a new Stripe account here. You have to link it to your local bank account, so that Stripe can transfer the money received from Medium to that bank account.

Tip: The service fee to pay Stripe is 3.4% + $0.50 per successful card charge. To reduce the frequency of incurring service fee, change the payout schedule setting to “monthly” for recurring payout sent automatically to your bank account.

After setting up, you can access your Stripe dashboard here.

**4. Set up Medium Partner Program dashboard
**

You need to do 2 things on the Medium Partner Program dashboard:

(a) to direct payment to Stripe, activate your Medium account via a ‘Submit’ button. After that you have to log in to your Stripe account to set it up to expect payment from new account “Medium”.

(b) to submit your taxpayer information before payment can be received. This has to be done within 1 month from joining the partner program or else your articles will cease to accumulate earnings; and earnings will be forfeited if not done within 180 days.

Tip: For non-USA residents, use the

W-8BEN Form

to submit Taxpayer information (3 simple steps).

After that you will receive an email from Medium that all is set up and done, so you can look forward to the residual income flowing into your bank account monthly.

These are 2 unavoidable things in life — death and taxes

From this month (November 2019) onwards, Medium has updated the way writer earnings are calculated. The major change is to use member’s reading time instead of article’s number of claps as the main signal. You can read more about it here.

Tip: Remember to click on the ‘3dots’ menu options before you publish your article

What else is cooking?

Apart from working on data science projects, my other hobby is doing problem solving on HackerRank to sharpen my skills.

my profile: https://www.hackerrank.com/jnyh1

Coding makes me a logical thinker, or rather being a logical thinker I could “speak the snake language” (Python) and empathise with how the computer thinks. You can read this in my next blog — Applying Statistics in Python - part I. Feel free to connect with me if you are like-minded and would like to discover solutions together.

“As iron sharpens iron, we can only get better!”

Thank you for reading!

How to create QR Codes for contacts (phone number and email) and QR Codes for websites (web address URL)

Efficient is to do things right. Effective is about doing the right things.

I am an advocate for efficiency and effectiveness. There must be a more efficient way to share contact details other than manually typing details into the mobile phone when I meet a new business contact.

Add a new contact on your mobile phone by scanning a QR Code

Google has launched the Google Contacts App in 2017, users then could share contacts by creating a QR Code. Simply scan the QR Code to save the contact on your phone. I personally think that such an efficient way to save contact details should be implemented on name cards and marketing brochures.

The trend did not seem to take off, maybe because people do not know how to create the QR Codes in the first place.

To create a list of customised contacts QR Codes

I made use of Google Sheets template to generate the contact QR Codes. Open the template (template link here) on another tab. Please click on “File –> Make a copy” to save it on your own “My Drive” (Google Drive).

Note that this Google Sheets template seems to work only on desktop, not on mobile phone.

You can use this template by updating First Name, Last Name, Mobile Phone number and Email address. The contact QR Code will be generated in the next column based on these 4 fields.

=image(“https://chart.googleapis.com/chart?chs=150x150&cht=qr&chl=BEGIN:VCARD%0AN:" & A3 & “%20” & B3 & “%0ATEL;CELL:” & C3 & “%0AEMAIL:” & D3 & “%0AEND:VCARD”)

Another person can scan the QR Code to add the contact details to his/her mobile phone.
New iOS and Android versions are equipped with QR Code scanner in camera mode
.

Scan QR code to save contact
. After scanning, simply click on “Save” to add to Contacts.

This contact list QR Code template will be useful when you meet new people in a team or in a tradeshow to gather everyone’s contact details.

To create a single customised contacts QR Code

Go to QR Code Generator, and select ‘vCard’ where you can customise various fields. Remember to test it out, as some fields do not allow special characters eg. comma “,” or the @ symbol, etc.

You can also add this QR Code to your business cards and marketing brochures. Customers and business counterparts can then easily scan and save your contact details to their mobile phones.

Go to a website (web address URL) by scanning a QR Code

Newer versions of iPhone and Android phones are equipped with QR Code scanning in the camera function. Simply turn on the camera and hover over the QR Code to scan it, then you can click the popup to go to the web address URL embedded. For example, try to scan below QR Code:

Web URL embedded https://www.qrcode-monkey.com

To create your own customised websites QR Codes

I usually go to QR Code Monkey to create a customised QR Code, and the website URL address is embedded in the QR Code above. It is friendly and free to use, and there are more customised options if you prefer:
· add a logo image in the middle (this can be your company logo!)
· set a color (to follow your corporate identity)
· other customised design

So now you can easily create marketing materials with QR Code of your company website.

QR Codes for name cards and marketing brochures

I hope the above hacks are useful to get things done more efficiently and effectively. All the best to your marketing and outreach efforts!

Thank you for reading!

People are really creative when it comes to making jokes about the coronavirus situation

The COVID-19 coronavirus has made us think that this is such a weird time we are living in. People started WFH (working from home):

-—

Many companies struggled to keep themselves afloat:

-—

And some hidden personalities started to appear:

-—

On 14 Feb 2020 (Valentine's Day), someone suggested this bouquet would be a valuable gift:

-—

People made amazing choices when in love:

-—

There's panic buy everywhere. Interestingly the first item to go out of stock is TP, also known as TOILET PAPER. So the Maslow's Hierarchy of Needs became:

-—

And I wonder why TP can be so tempting in some parts of the world:

-—

LEGO product management was quick in extending their collections:

-—

Marketing people also rolled out their creative juices:

-—

But a man still makes his decisive choice:

-—

The coronavirus became very ferocious:

-—

USA seems to have more COVID-19 cases than Japan, and I went on to investigate why. The reason is because they wear their masks differently:

-—

On 16 Mar 2020, President Donald Trump announced a lock down in California, and everyone started queuing up to buy TP everywhere, especially in this place:

With a video to display the magnitude of the matter:

-—

He started making many comments, and people also started commenting about him:

-—

President Trump said the virus came from China, so the word “Chinese Virus” became quite sensitive:

-—

In Russia, there is quite some motivation for people to stay at home:

-—

In Malaysia, we have some very special talent:

(need to be familiar with the Malay term 'bomoh')

-—

A new currency also emerged in Malaysia:

Translation: The bill was RM$2.80. So the patron paid with 2 masks worth RM$3.00, and the stall owner returned RM$0.20 with 2 squares of toilet paper.

-—

People started to keep a distance from one another:

-—

In Singapore, “social distancing” became a buzz word everywhere, even when you stand in queue:

In some other parts of Singapore, people do not need lines on the ground for social distancing in a queue.

-—

Then McDonald's decided to join the crowd:

-—

And many others also joined the crowd:

During these tough times, you could be a hero when you stay away:

-—

Finally, if you are still deciding whether to go out for drinks or somewhere for a vacation, here are some choices for you:

.......

The threat is real, and very scary. Here is a self-interview by a coronavirus patient, and I hope she is still around to read this post:

Final words

None of us is stronger than the virus, so please take care everyone!

Agile principles and values can be applied to the way you approach data science projects

Project management methodologies are commonly used to get projects done or a product (or referred as a tool) produced. They are, in general, processes and frameworks which break down the overall objective to individual tasks organised on a timeline. This can be adapted to be used to approach data science projects.

In the past, the traditional Waterfall methodology (dated way back to 1970s) has been very popular. It defines all requirements and parameters of the product at the start, so that the project team can work towards this target in sequential phases. This method has been very suitable for the manufacturing industry where product specifications seldom vary with time. It requires very extensive upfront planning, and ideally, the output product is exactly the same as specified in the beginning.When the Waterfall methodology is becoming unsuitable, many popular project management methodologies have emerged over the years, especially in the software development industry. Let me share the most popular one.

Agile Framework

Agile is a way of working developed in 2001, and is a widely used to manage software development projects. It is suitable for fast-paced development cycles and has provision for specifications change during the design and build process. It is flexible, and strives for iterative incremental improvement in the product through team collaboration. In short, Agile is to plan, build, test, learn, repeat.

Agile teams are responsive to the unpredictable requirements as the project unfolds, through iterative work processes. Below are Agile principles which serve as a framework (guideline) to the way of working:

•Customer satisfaction through early and continuous software delivery

•Accommodate changing requirements throughout the development process

•Frequent delivery of working software, as the working software is the primary measure of progress

•Collaboration and interaction between the business stakeholders (client) and developers (vendor) throughout the project, including face-to-face communication within the development team

•Support, trust, and motivate the people involved

•Agile frameworks to support a consistent development pace

•Attention to technical detail and design enhances agility

•Simplicity in looking for solutions

•Regular reflections in the self-organising team on how to become more effective

Agile projects are characterized by a series of tasks that are conceived, executed and adapted as the situation demands. However, Agile focus is not on what to do, but how to think; it values and places priority on:

•Individuals and interactions (rather than processes and tools)

•Working software (rather than comprehensive documentation)

•Customer collaboration (rather than contract negotiation)

•Responding to change (rather than following a predetermined/fixed plan)

While Agile principles and priorities are employed for greater productivity, most of them could be leveraged for data science (DS) projects. Moreover, data scientists do not know how to schedule the project because it is impossible to determine a specific timeline for the type of “research” and exploratory work. Most DS projects require trial and error by going down different paths and trying different techniques. They do not have an element of certainty in the output, so Agile is most suitable to be adopted to direct the workflow.

Most other projects deal with what customers want, what the developers want, and what the business seeks. When working with DS, another perspective is added: what the data is telling you. Data scientists cannot make any sense out of the data unless they develop a basic understanding of it. There is a lot of investigation, exploration, testing and tuning. Agile uses the concept of iteration and constant feedback in order to refine a system under development, in order to move up the Data-Value Pyramid.

When working on DS projects, insights are not immediately achievable; multiple iterations are needed before any insights can be discovered. I will explain the main Agile working practices (Scrum framework), and how they can be applied to DS:

1. Define the business need and the project objective. This is usually driven by the product owner who is responsible for the product features and quality. It is the big picture stuff, but this is the core belief that you will refer back to as you build. In DS, the product owner could be the client, the business, or the end customer (for example, end user of a prediction tool). Understand what problems the product owner is facing and tailor the project proposal to meet their needs.

2. Build the backlog. Focusing on the user requirements (“user stories” in Agile), a list of tasks is derived to accomplish to build product features or improve product performance. The DS team builds the backlog together with the product owner to determine the product features and performance targets. The backlog could start from getting the data in the structured way before they can be analysed. Then it could be a list for feature selection or feature engineering, or a list of models to select, tune and optimise.

3. Prioritise the backlog, identify the backlog tasks which will bring the most value with the least effort. In DS, not every approach is worth trying, so cover the most promising ones first. When the main ones are conveyed, you might find that the remaining others are not as important as initially thought.

4. Do a sprint (the actual development work). Sprints are usually two-weeks cycles where high priority tasks on the backlog are worked on. In DS, each sprint could be two to four weeks depending on the team size. During the sprint, always complete the task with the highest priority before moving on to the next in line.

5. Have daily standups. Standup meetings are for team members to be accountable to one another on their progress in the current sprint. Each team member take turns for status reporting — what was done the day before, what to do today, any potential obstacles. The most effective communication happens when DS team members meet face-to-face to share their work.

6. Review the sprint output (sprint retrospective meeting). At the end of two weeks, there should be a functional output for the project team to demonstrate, with an incremental improvement in the product. Data scientists should share the outputs before trying to perfect the processes. Get feedback from client stakeholders and prepare for the next sprint. Regular feedback is a key principle for the Agile way of iterative incremental improvement.

7. Prepare for the next sprint. Identify the tasks that are going on well to keep doing it, and identify those that are impediments to be removed. It is important to understand the fact that unlike software development, DS is more experiment based than task based. DS helps explore data so it should be treated as multiple research experiments. Once again, build and prioritise the backlog so that the next sprint can be carried out, to work on the next improvement areas.

8. Roll out the final product. When all stakeholders agree that no more improvement is needed in the product, it is ready for the final deployment. DS projects follow the “law of diminishing improvement”. For example, if a model has achieved 70% accuracy, the next 5–10% improvement will take a lot more effort than before, and it also depends on the limitations in the data set. Decide in the team whether the efforts are worth the incremental improvement.

Challenges with the client

Besides having adequate communication between the DS team and the client, the client’s expectations have to be managed. All clients generally love the idea that Agile is flexible, and that it grants them more opportunities to change their mind as the project develops. However, they might not realise that such flexibility is also costly in both time and money. Here are some things you should do:

(I) Get the client to understand that flexibility is inevitably expensive. It is like a flexible full-fare economy ticket which allows itinerary change will cost much more than the fixed one. Making changes in decision also means that the client is paying for past wasted time and effort.

(II) Set the client’s expectation to commit time for frequent sprint retrospective meetings (e.g. every two weeks) to evaluate the completed sprints. On top of that, the client representative in each meeting needs to be (empowered by higher management) able to make decisions on product specifications. For Agile to work, the client needs to provide continuous feedback and priority setting to keep the project moving.

(III) Earn the client’s trust that each iteration is done with the best possible efforts to deliver value and improve the product. While holding the decision making power, the client also expects an iteration to have tremendous improvement. Such imbalance in responsibility in the client-vendor relationship should be converted to mutual trust and willingness to experiment together. Agile’s principle in collaboration means it is a team effort in both making decisions and delivering value.

Minimum Viable Product

One key feature of the Agile way of working is the development of a minimum viable product (MVP), which is the most fundamental configuration of the product (or tool). After the project objectives have been defined, a proposal is made on the approach to the problem including the MVP to be built within the shortest possible time, e.g. one month for DS projects. The MVP has only the most important functionalities, but its performance may not be the most optimal.

This seems very risky putting a less-than-finished version for the client to test, so the team (including the client) has to be prepared for it. The purpose is to make the MVP work, test it, and see if it is really in the correct direction of solving the problem thus helping the business case. The MVP will grow better, because the DS team is going to use what they have learnt from the MVP feedback to build an improved version. Agile is about continuously deploying and learning from your mistakes, and working with the client to make the product better.

Agile is to plan, build, test, learn, repeat

DS project deliverable

The Agile way of working allows data scientists the ability to prioritize and create roadmaps based on requirements and goals. With each iteration, data scientists can learn something new, get more refined results, and ride on them for the next incremental improvement. Below are some Agile project deliverable to shape and guide project process:

Project vision statement: A summary that articulates the goals for the project.

Project roadmap: The high-level view of the requirements needed to achieve the project vision.

Project backlog: Ordered by priority, this is the full list of what is needed to support your project.

Release plan: A timetable for the release of a working product (or tool), but not documentation. Projects should be self-documenting along the way.

Sprint backlog: The user stories (requirements), goals, and tasks linked to the current sprint.

Increment: The working product functionality that is presented to the stakeholders at the end of the sprint and could potentially be given to the client. The goal is not to deliver more but to get a higher value output.

Summary

Agile is going to be adopted by more DS project teams in the near future. Many data scientists have reported that it makes them more productive. This is not because the data scientists have become more skillful, but because Agile can help them optimize their projects. Instead of spending time on models that are unlikely to reveal any productive results, it is better to spend that time for other result-driven purposes.

Being “agile” (flexible) means you need to adopt a dynamic approach in planning and be adaptable to the changing needs of the new situation when it arises. The Agile environment appeals to quick action, fail quickly, discuss and evaluate, then try again using a different approach or an improved method. It works great in dynamic environments where there is a potential for changing or evolving requirements.

All the best to your DS projects!

Reference:

Data-science? Agile? Cycles? My method for managing data-science projects in the Hi-tech industry.

A role that manages these stakeholders: end user, client, business, development team, as well as what the data is telling you

As data science technology starts to grow and evolve, there are well-defined roles of data scientists, data analysts, and data engineers to meet the operational requirements. However, there is a certain class of data problems that is not a good match for any of these single role alone.

While the data scientists are deeply immersed in statistical models, data analysts drowned by data wrangling and visualisation, and data engineers buried by infrastructure — data whisperers emerged to lead those resulting models to the actual business problems.

This new role is a bridge or gateway between the business and technical data teams. Analogous to the traditional product design and realisation, there is a process called Quality Function Deployment. During the formative years of my career in Philips Electronics, I have translated qualitative consumer demands to quantitative parameters for technical design/production engineers.

In recent years, many have progressed from a data science practitioner to a managerial role such as a data whisperer. While the success at analytics does not guarantee success in managing the people doing analytics, there are advantages with a data science practitioner background. The ability to empathise with the people doing the work is already a great advantage against other generalist managers from other domains.

Technical Background

Technical skills do not translate to management competency, as they are different skillsets: making sense out of data vs optimising resources used to make sense out of data; data crunching skills vs leadership skills; being a data scientist vs managing a team of data scientists.

The good news is that a data whisperer could be a mentor to the team, and have a choice to be hands-on for certain critical milestones of the project, especially when the mechanisms of each step in the workflow are so familiar:

· Exploratory Data Analysis (EDA) — a significant amount is spent on data cleaning and preparation stage which is unproductive but necessary

· Feature Extraction, Engineering, Selection — this can be so time consuming, and requires some trial and error

· Model Selection and Optimisation — many iterations are involved while tuning the hyperparameters

· Performance Assessment — error analysis, the design of statistical tests and the understanding of which performance metrics to use

More emphasis could be placed on the project approach, and to direct the team to move in the correct direction. The many years of analytical data experience will complement the intuition to focus on certain areas.

Stakeholders Management

Engagement with business stakeholders is not only restricted to project objectives and presentation of final results. It is also the managing of on-going expectations at each project milestone to allow technical teams the time and space to do work.

Business and technical people operate on different planes and speak different languages both literally and figuratively. It is imperative for both parties to communicate in the same dimension, and it is more so when the work done by the technical team has to materialise as a presentation to the business team.

An example of a crippling miscommunication can happen: business executives can become dismissive of the insights drawn when they cannot fully comprehend the data or accept the work done. Data professionals can be perceived as aloof or uninterested in the business problems of their non-technical counterparts. The person who can bridge this gap is the one who possesses a clear understanding of both worlds.

People are more willing to talk to you when they feel understood.

The challenge is to tell the data story that can be easily understood by business decision makers, and it requires good background concepts of data visualisation. Be creative as the presentation need not be restricted to a Microsoft PowerPoint file. I have witnessed the effectiveness of using physical aids like charts and demo kits.

Leadership

Motivation, Inspiration, Visionary, Passion and amongst many other qualities, I have been inspired and personally embrace these core values:

· Loyalty (to the cause)

· Leadership (by example)

· Discipline (refer to 7 habits of highly effective people)

· Professionalism (deal with the matter, not with the person)

· Fighting Spirit (determination)

· Ethics (integrity)

· Care (empathy for subordinates)

— adapted from SAF Core Values, 1996

A leader should possess a high Intelligence Quotient (天时) which is somewhat determined at the time a person is born, high Emotional Quotient (人和) to maintain good people relationships, and high Adversity Quotient (地利) to handle adverse situations. These qualities are also in the Chinese saying for a good time to go to war: 天时地利人和, which means the time is right, geographical and social conditions are favorable.

Another good concept of leadership by Jim Collins is: leader of self (level 1), leader of teams (level 3), leader of leaders (level 5). One of the important goals in leadership should be to develop people with the long term goal of them becoming leaders eventually.

Project Management

Just like Moses had been led to the promised land, the data science arena has its own ineludible promise:

“You shall be faced with exciting problems and brilliant people producing amazing results”

To begin with an end in mind is to first understand the business needs, so that the problem solving approach and the project task can be scoped up, and the right data can be pulled out to support the task. Work has to be prioritised such that it is a win-win situation for all parties.

Data whisperers understand the iterative nature of the data science workflow, as opposed to the traditional objective-deliverable-milestone project workflow. Agile framework can be adopted to manage the iterations in the workflow, with the sprints and backlogs monitoring. However, it is also critical to have intimate knowledge on the inherent limitations in the dataset and its strong bias which hinders improvement, or else the iterations could go on infinitely.

The primary purpose of the project management role is to manage resources:

· time (schedule, work plan)

· money (budget and expenditure)

· people (to synergise with the right competence and skillsets)

· data (the “raw materials” for machine learning)

· capabilities (tools, infrastructure and data pipeline)

· sub-contractors (precious additional resource in critical path)

etc, and to lead/drive/motivate the team to get the great work done.

In a company, the most important resource that is underutilised is data, and in its raw form analogous to crude oil. More of these questions should be asked:

· Where is data coming from? Who looks after data storage and security?

· How to make use of data? What decisions need to be made based on data?

There is a lot of planning involved in how data assets can be strategically gathered, stored and deployed. Not only the quantity and type of data, planning also involves evaluation of the data flow processes and storage infrastructure, so that strengths can be exploited and weaknesses mitigated.

Sharpening The Saw

In this digital knowledge era, everyone will need to be either digital or data literate, and even more so for leaders. I have personally benefitted a lot from an immersive data science boot camp in Singapore (by Metis).

“The illiterates of the 21st Century are NOT those who cannot read and write but those who cannot learn, unlearn and relearn.”

― Alvin Toffler (1928-2016)

In A Nutshell

Data science involves not only analytical and coding skills but also domain knowledge to understand how to deploy data to solve problems. Data whisperers possess these skills and additionally the soft skills required to speak intelligently to people for persuasion and adoption.

This article is dedicated to Ms Marvelous, a fabulous data whisperer.

References:

https://betanews.com/2016/07/11/the-data-whisperer/

https://www.datanami.com/2016/07/13/secrets-data-whisperer/

https://www.linkedin.com/pulse/five-signs-you-might-data-whisperer-allen-hillery/