5 Data Science Hot Shots!

The field of Data Science has grown tremendously over the past decade, and it is not without the contributions of many significant individuals. Take a look at five key figures in Data Science and their pivotal contributions to the field!

Claude Shannon

Claude Shannon – father of Information Theory

Perhaps one of the most colorful personalities in the field of data science, Claude Shannon is well known for being the father of Information Theory. Today, Shannon’s work on information theory underpins almost all Machine Learning algorithms, making this theory the foundation of Data Science. His seminal paper “ A Mathematical Theory of Communications” catalyzed the start of the electronic communications age and has a significant impact on the Telecommunications and Computing industry. Shannon discovered that yes-no situations on a telephone switching circuit could be expressed by Boolean two-value binary algebra, where 1 means “on” and 0 means “off” on the switchboard.

Another important discovery by Shannon is the concept of entropy to demonstrate the shortage of information content in a message. Entropy means that the degree of randomness in any system will always increase. Shannon proved that in a noisy conversation, signals could always be sent without distortion and many sentences could be significantly shortened without losing its meaning. This was done by encoding the message by using a built-in error correcting code that acted as a “self-checking” mechanism. On how he coined the term ‘Entropy’, this is what Shannon had to say:

“My greatest concern was what to call it. I thought of calling it ‘information,’ but the word was overly used, so I decided to call it ‘uncertainty.’ When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more importantly, no one really knows what entropy really is, so in a debate, you will always have the advantage.”

John Tukey

John Tukey popularised the use of the Fast Fourier Transform (FFT) algorithm

Tukey was best known for popularizing the Fast Fourier Transform (FFT) algorithm and described how to perform it conveniently on a computer. He also contributed substantially to the field of data analytics, having written the book “Exploratory Data Analysis” that posits useful principles for practicing statistical analysis of data, which was neatly summarised by A. D. Gorden as:

“… the usefulness and limitation of mathematical statistics; the importance of having methods of statistical analysis that are robust to violations of the assumptions underlying their use; the need to amass experience of the behaviour of specific methods of analysis in order to provide guidance on their use; the importance of allowing the possibility of data’s influencing the choice of method by which they are analysed; the need for statisticians to reject the role of “guardian of proven truth”, and to resist attempts to provide once-for-all solutions and tidy over-unifications of the subject; the iterative nature of data analysis; implications of the increasing power, availability, and cheapness of computing facilities; the training of statisticians.”

Andrey Markov

Andrey Markov the “militant academician”

Best known for his research in stochastic processes, Andrey Markov became well known for the Markov Chain. The Markov Chain consists of a set of transitions that are determined by a probability distribution. Markov Chains are widely used across different academic fields, which include linguistics, biology, and finance just to name a few.

As a mathematician and political activist, Markov recognised the practical importance of academia for solving real world problems, having once remarked that:

“ The alleged opinion that studies in seminars [in classes] are of the highest scientific nature, while exercises in solving problems are of the lowest [rank], is unfair. Mathematics to a considerable extent consists in solving problems, and together with proper discussion, this can be of the highest scientific nature while studies in … seminars might be of the lowest rank…”

For his vocal activism on social and educational issues, this earned him the nickname in the press as the “Andrew the Furious” and the “militant academician”.

Today, with the explosion of data, Markov chains underpin many real-world applications such as text generation, natural language processing, human gene mutation, and financial modeling.

Thomas Bayes

Thomas Bayes was an English clergyman and a mathematician, who established the mathematical basis for probability inference, in what became known as Bayes’ Theorem.  Bayes Theorem calculates the probability of one scenario, based on its relationship with another scenario. This time-tested theorem was pivotal to cracking the Nazi Enigma code during World War II by Alan Turing.

Thomas Bayes

Sharon Bertsch McGrayne summed up the enduring nature of Bayes Theorem (and the Markov Chain) in “The theory that would not die: How Bayes’ rule cracked the enigma code, hunted down Russian submarines, and emerged triumphant from two centuries of controversy”:

“The combination of Bayes and Markov Chain Monte Carlo has been called “arguably the most powerful mechanism ever created for processing data and knowledge.”
Almost instantaneously MCMC and Gibbs sampling changed statisticians’ entire method of attacking problems. In the words of Thomas Kuhn, it was a paradigm shift. MCMC solved real problems, used computer algorithms instead of theorems, and led statisticians and scientists into a worked where “exact” meant “simulated” and repetitive computer operations replaced mathematical equations. It was a quantum leap in statistics.”

Hans Rosling

Hans Rosling was also a sword swallower!

Despite being a statistician, Hans Rosling was also a showman, who understood the challenges of how “dry” data can be and found the theatrics needed to make the world appreciate the value that data can bring. He entertained the audience by assuming the persona of a sports commentator during his TED talk presentation to tell a story of how data can be used to dispel myths on population growth and economic progression in developing countries. He once quipped that:

“Forming your worldview by relying on the media would be like forming your view about me by looking only at a picture of my foot.” 

Hans Rosling’s TED talk

Facebook Comments