Montreal’s Yellow Pages Group is well known for advertising, especially in print media. It is a competitor in a segment which is expected to be dominated by Google and other online sources. However, despite the competition, today it has around 276,000 Canadian advertisers out of a potential pool of 1.2 million.
Yellow Pages recently opened itself to data science. This marks the end of outsourcing its big data applications. The company looked at strategic cost implications of in-house big data application to get higher profit margin. It has spent millions to consolidate its 18 data centers into 3 and brought its outsourced business intelligence application in-house by constructing Hadoop clusters to house it.
The company has two analytics applications:
- Anametrix, used internally
- Yellow Pages Analytics, used by advertisers
Advertisers use Yellow Pages Analytics to crunch metrics like visitors, page views, KPIs and to calculate potential ROI. This is done using big data – one table has 52 billion records with millisecond response times.
Richard Langlois is the current IT director of Yellow Pages. He was hired in 2012 to transform the company by accelerating the move to digital. When he was hired Yellow Pages Analytics was outsourced to a California firm. Strategic cost implications led the company to bring the company into the digital age by bringing the application in-house. Langlois was given a blank cheque and 18 months to do an infrastructure overhaul, part of a four year plan.
Langlois appeared in a big data conference in Toronto in June, 2014. He claimed that there is no need to re-invent the wheel for big data applications.
“What you know about business intelligence, data center architecture, development, strategic planning … we use all of that and big data.”
Yellow Pages is transitioning from print medium to online. There are subtle differences between these media and Yellow Pages is looking for ways to leverage these differences and get maximum returns.
“We need to provide location-based content so you bring business to the advertiser.”
Langlois presented the flow chart of Yellow Pages Analytics at the conference. There are 16 servers and 2 primary nodes for the Hadoop clusters. Also, there are 5 data base machines in production for availability and load balancing. There will be an analytics services layer for business intelligence dashboards to make things easier for small and medium size companies to do analytics and to relieve the IT department of extract, transformation and load (ETL) operations. This does not include improving the collection of data, creating new applications such as unified tracking tool and methodology.
Despite the complexity of the flow chart, Langlois maintains big data roadmap is no different from an enterprise architecture roadmap. He also added that it is important to decouple architecture from projects so that changes in projects don’t affect the architecture.