IMG_8556

The 2015 Advancing Analytics IAPA National Conference  brought to us by ADMA was a data science focused one-day conference that delivered global ideas for driving business forward through data and analytics.

Here are my “LOOSE” notes madly taken down while trying to take as much of the day in as possible.

I also photographed as many of the slides as I could, see the gallery at the bottom of this post.

Please excuse typos and  half finished sentences, I wanted to get this up asap. (and up at all.. I know what its like!)

Key takeaways from this day were:

  • challenge us to do / act
  • creative solution come from places you don’t expect them to.
  • think creatively on everything
  • take away something from every single industry.

 

Massive datasets and huge-scale analytics: how CERN delivers insights that matter

Bob Jones, Project Leader CERN

 

The mission of cern push hack the frontiers of knowledge develop new technologies web and the grid system trains scientists and engineers of tomorrow. Very expensive to reproduce data, need to figure out scenarios to keep data long term. The big data analytics needed for more than 20 petabytes of data per year Applying techniques used for Large Hadron Collider (LHC) data analysis to your growing datasets. The future is open – developing an open analytics platform via collaboration Bob Jones was the Head of CERN openlab between January 2012 and December 2014, for the CERN openlab fourth phase.

Providing insight into the evolving market of advanced analytics

 

Usama Fayyad, Chief Data Scientist Barclays Bank

 

Bid data is a mixed of structured, semi structured, and unstructured data – Typically breaks barriers

unstructured data is taking over.
Yottabyte – 24 to the power

3 v’s of data

  • volume
  • velocity
  • variety

Blob – binary large object
Clob – character large object

Database doesn’t know what to do with it.

Data Axioms

  1. data gains value exponentially when integrated and conceals
  2. fusing data together from independent sources is difficult…
  3. standaisation is essential
  4. data governance is critical and policy must be centralise
  5. recently matters – data streaming in modelling
  6. data in structure needs
  7. data is a primary competency and not a side activity

To Hadoop or not to Hadoop.

  • built on standard hardware
  • fault tolerant
  • build as a search engine
  • drivers – cost of storage.
  • hadoop is only $2k per terabyte

ETL – extract transform and load
replaces expensive licences
higher performance
flexibility on scheme and taking structured and unstructured data.

4th V of data
Value

netsteer – google adwords competitor that understand contextually better.

The connected cow:

every company is a big data company
data increased 70% chance of conception.
estrus rate

Sentiment analytics revelations: detecting emotions in your textual customer data

Stephen Pulman, Professor of Computational Linguistics Oxford University

Multi dimensional sentiment analysis – what tools we use.

  • linguistically based pattern matching (information extraction resign/be-sacked/
  • Machine learning methods – train a classifier using annotated data data.. naive bays, support vector machine, averaged perceptron, neural networks.
  • currently fashionable – deep learning methods – convolutional neural net, long short term memory models
  • choice usually depends on annotated data – expensive time costuming to acquire

What is sentiment analysis:

used to sentiment proper – positive, negative or mutual attitudes expressed in text

someones factual statements imply sentiment –

Building a sentiment analysis system:
cheap and cheerful – processing positive and negative words and counting number of positive and negative words.
problems –
is it neural only when its even
compositional sentiment – “not wonderfully interesting” two words are positive,
some words are positive in some contexts – cold beer vs cold coffee

version 2 – better
getting a training corpus of texts humans annotated for sentiment
represent each text ad a vector of counts of grams
should capture some compositional effects
will work for any language and domain
bag of works mean structure is ignored.
problems –
rich compositional effects – clever, too clever, not to clever, (last two nativity)
difficult to pickup mixed sentiment

version 3 – best – 80/90% accuracy
do a full parse as possible input of texts
use syntax to do compositional sentiment analysis

Sentiment logic rules
kill + negative > positive
kill + positive + negative > negative
problems
extra context dependance (cold, wicked sick) – understand the author to find language
can’t deal with reader perspective, “oil prices are down” good for me, bad for oil company shareholders
can’t deal with sarcasm or irony

intent and risk –
detection of future predictions or commitments to finance reports
to get early warning of custom actions particularly intent to churn

Recognising deception in text.
difficult to detect in text vocal eye gaze and posture cures are absence
can we find linguistic facts
Pennebaker LIWC – system to use

Linguistic characteristics associated with True / False

Emotion detection –

confusing variety of different theories of emotional state five basic emotions:

  1. anger
  2. disgust
  3. fear
  4. happiness
  5. sadness

ekman classification – corresponds with facial expressions.

Analytics at the edge – the effects of the Internet of Things (IoT) data explosion – Arkady Zaslavsky, Senior Principal Research Scientist CSIRO

BIG DATA = Internet of things.

Jet engine can create 10 TB of data each hour.

Square kilometre array 3000 radio telecopies placed 30 PB of data per day.

3000+ KM array.

Brontobyte 10 sqr 28


Analytics as the great enabler: how giving life to data powers ‘Dairy for life’

Kevin Ross, Chief Scientist of Optimisation Modelling Fonterra

 

Core analytics skill set formulate a problem select the right tool handle data and write code communicate 5-10% of GDP Supply chain guru – off the shelf tool that does supply chain design. key lessons –

      • data cleaning is always in scope
      • big difference between project based and maintained models optimal design and business case
      • optimal financial decision and optimal decision black boxes are dangerous.

    stream return = how much money can you make from a kilo of milk solids.  

Panel discussion – What makes the best analytics team?

 

Dr Leif Eversen, General Manager Business Performance Analytics, Westpac

 

Julie Batch, Chief Analytics Officer, IAG
Liz Moore, Head of Research, Insights and Analytics, Telstra

 

 

  • tactics for rewarding great employees
    • development around three E’s
      • experience – stretch project in a different area, work on another area, deepen the experience on projects and programs, move from rev gen to customer experience 
      • exposure – present the work to a range of stakeholders such as CMO, showing off the work they are doing, skills need to develop to do this
      • education – providing the right training courses and getting the balance right. 
    • renumeration isn’ the centrepiece of keeping great employees.
    • create the environment and culture
    • allow it to be disruptive.
    • set a good value proposition for employees.
  • how do you recruit for these skills
    • integrate technical specification and communicate those to the team.
    • combination of people with the right skills
    • and a small amount of people who can integrate all these skills.
    • portfolio approach – integration of 
      • identify people who want to move into that area
      • integrate information and insights beyond analysis
      • pickup other business pieces bring other information in.
  • 4th skill set – integration of data into the team
  • bringing together server insight with revenue insight
    • market research piece picked up and used in a consumer space.
    • take an insight and apply it into a modelling project.  
  •  what characters a good player
    • tell the business that they don’t already know in a way that has impact. 
    • senior person to frame up the business case of the stats they are trying to understand.
  • softer qualities
    • quiriocity.
    • courage to make a difference
    • drive to change the world – disruptive mindset.
    • looking for futurists – looking for the new paradime paradigm 
    • looking for “laziness” i.e. they automate tasks and try and avoid doing the repeditve work.
  • data hunters
    • finding data, curating the data 
    • infracutures
    • technical & integrated data scientists. 
  • disruptive skill set we are looking for.
    • huge benefit in having people with broader business experience before going into analytics
    • have a much more credibility when getting into this area.
    • 30 year view of anaytics – benefited from analytics barrier of entry, hard to get into the area. 
    • need a high level of business acumen 
  • graduates – come through with no preconceived views, fresh mind, prepared to try new things and be different. 
  • double major in mathematics and marketing – rare to find people with a science degree and business degree. 
  • try and find people who look at a problem differently.

 

Utilising machine learning for smart, targeted advertising

Claudia Perlich, Chief Scientist Dstillery

 

  • Utilising machine learning..
  •  A silent revolution…
  • Digital Data cross device.
  •  Programatic
  •  Audience and segmentation studies
  1. optimse for people who will buy the product
  2. optmise for people who you can change the way they think after seeing the ad.
Wish list – the probability that the person who is seeing the ad will buy the product.
Predictive  modelling = function approximate. 

Fraud
    • Bots pose ad humans
    • fake conversion events
Traffic patterns are non human
data from bid request
Big data is killing our favourite metrics.
Bots are easier to predict that humans.

 

Models for path to conversion how many days before you convert do you need to target someone 30 day before to buy a house.. insurance only moments before you buy – timeline to conversion
Where are the people who eat at Arbies
more importantly where do they live
Out of home – where to target people – at the bus shelter they go past each day?
Measuring causal effect –
A/B testing – many practical concerns
Estimate casual effects from obersrvational data.
– use predictive modelling to estimate causal impact
Ads for prospecting should be different from those ads that are re-targeting
Pitfalls
  • Data quality issues
    • smartphone users jump from location to location, travel faster than speed of sound
    • people piles – location information at “standard locations” / default locations
  • Reliability / scale tradeoff
  • Probabilistic matching cross device.
  • Anything that can be faked will be faked.

 

Logistic Regression.
    • create a line which side of the line do people fall on to show them the ad

Audience size of one 

probilitliy of buying after seeing and ad
    • predictive modelling on digital activity
    • seeded
10 million columns / 1-50k postivies
We must target not just the right user.. but the right user, at the right time in the right medium, on the right mindset

 

 

mobile traffic – 10-40% of internet traffic is suspect, up to 40% could be accidental – 20% legit

home IP address – we will associate devices from your local ip and give it a higher propinquity score
Modern analytics coming full circle

Pre-emptive shipping: Using analytics to predict product performance

Igor Elbert, Principal Data Scientist Gilt.com (USA)
Gilt pioneered flash sales
predict which items will be sold in ‘west’

  • Dealing with high cardinality
  • Cardinality attributes
  • Reducing cardinality
  • Clean the data spellings and lowercase and uppercase
  • Clean it with Levenshtein distiances
  • Cut the long tail algorithm specific over x percent of cases

 

  • Cheat – peeking into the data set allowed
  • Slice the data set – each subset has its own set of attributes
  • Rough it up royal blue is blue – clustering helps
  • Clustering brands helps new brands too

 

Subjective attributes – side effects business’s insights best practices

  1. Who is the buyer
  2. Who curated the item
  3. Who set the price
  4. Who is the photographer

 

Dig into what happened with items that didn’t sell. To understand where the system went wrong

More subjectivity ask strangers for data

Using mechanical terk 

  • Quality responses
  • Quick turnaround
  • Produced serveral good predictors
  • User generated content
  • Easy to automate

Some dresses will sell… on a good day

  • Day of the week
  • Day of the year
  • Previous next holiday
  • What else is on sale possible halo effect
  • Price line up
  • Proxy of traffic more visitor higher chance to sell
  • Time of sale competing with other shoppers

Rewards

  • Dry runs look promising shipping times reduced
  • Useful side effects
  • Propensity to sell pick and pack optimisation
  • Propensity not to sell insights on pricing merchandising inventory
  • Stress test for company logistics
  • Odder routing sale creation shipping

 

Conclusions

  • Shop at GILT. We need data
  • See big picture
  • Clean and trim the data
  • Use subjective data

 


Learnings
Author: Stewart Barrett

Stewart Barrett is an agile, results oriented data driven digital strategist & business focused online marketer with over 10 years’ experience. Passionate about businesses that challenge and disrupt markets, who is inspired and fascinated by the ever-changing world of digital.

Get Connected

You must be logged in to post a comment.