Rakuten's Journey to HTTPS

, William Moore, HTTPS

Rakuten’s Journey to HTTPS – Lessons So Far

Rakuten group is currently migrating its services to HTTPS; any service that uses HTTP is being migrated to HTTPS. This includes not just websites but also our mobile applications and other services we offer, for example ad and content services. This work is still ongoing, however globally over 80% of Rakuten’s services had completed their migration by March 2017. We wanted to share our lessons for managing such a large scale migration.

Our Task

Rakuten has a large range of businesses and services spread across three continents. Together these businesses and services have around 250 individual websites and more than 350 ‘non website’ services. Rakuten’s largest website (Ichiba Japan) has over 0.3 billion product pages and over 40,000 merchants. All of these had to be migrated to HTTPS.

Adding another layer of complexity is the Rakuten Ecosystem concept, which means that many services have interdependencies; to migrate services it is necessary to understand these interdependencies and co-ordinate the migration of the services taking these into account.

Image of Rakuten ecosystem

Lastly our most important consideration is always our users. Globally the Rakuten Ecosystem has over 900M!

Our Approach

Because of this complexity, we took the decision to manage our migration to HTTPS as a single program of co-ordinated activities. The structure of this program was aligned with our internal organization structure:

  • Rakuten is divided into 11 groups companies and a number of horizontal ‘group headquarters’ functions. Rakuten also has regional hierarchies within this structure.

  • In each group company, relevant group headquarters function’s and each region one or more program ‘leads’ were assigned to co-ordinate efforts in those areas on behalf of the program. The leads reported into a single Program Management Office (‘PMO’). A Program Manager headed the PMO and reported to a steering committee of senior executives from the Rakuten Group.

Image of the Program structure

We structured our program in this way to achieve the following:

  • Centralised and streamlined management of cross company dependencies – Creating a ‘group wide’ view to support risk analysis and prioritisation.

    All product and dependency information was mastered in a dedicated JIRA project, from which BI dashboards, various analysis and reports were produced. Teams working on each product or dependency directly updated this JIRA and all teams were able to check the latest (detailed) view the status here, reducing the communication overhead for technical teams, allowing them more time to focus on their own area of responsibility.

    Image illustrating dependencies

  • Group wide adoption of best practice for risk, issue and assumptions management – cascading relevant, actionable and specific information to technical teams / management / business members. Removing the noise whilst ensuring the most important risks are focused on by all.

    Our first action was defining and sharing a set of ‘common basic risks for website migrations’ (listed below) which required each website team to consider and mitigate. This gave us a basic level of risk management and a systematic way of minimizing migration issues across the program.

    Our ‘Common Basic Risks for Website Migrations’

    Table listing the risks

  • Centralised HTTPS related data and reporting – creating a single source of data (using JIRA) for all teams, which fed into a single set of management dashboards and reports. This has helped us to minimize miscommunication and maximise data quality. It also enabled quick escalations for real issues.

    The PMO provided guidance for all teams on how status for their projects should be assessed, including things like clear criteria for judging a product to be “Red’, ‘Amber’ or ‘Green’ status with its migration. This minimized misunderstandings between the PMO, Leads and project teams.

    In addition the status information for each service was only accepted into reports following a review by the PMO and the relevant Lead, and if necessary project teams for each product. This maintained the relative quality of information and allowed the PMO to make an expert judgement in certain cases.

  • Formalised approach to best practice adoption and information sharing – by utilising the power of Rakuten’s ‘Technical Community’ we encouraged engineering teams help each other. We also created community powered, Q&A service for any HTTPS query.

  • Updated group wide policies, regulations and processes to embed https – we aim to prevent slippage back to http through a multi-levelled approach (e.g. Policies forbidding use of HTTP on new or updated services, updated QA processes to check for mixed content, utilising existing scanning processes to help validate HTTPS migrations, updates to internal user guides for systems related to content creation or publishing).

What we Learned

  • Our biggest challenge is clearing mixed content, especially in larger sites. We asked all our teams to consider how they can monitor content during and after their migration. One of the approaches we recommended was to configure Content Security Policy to report HTTP content (however this may not suitable for all sites). It is also important to remember that reporting with CSP should not be configured until you have migrated the majority of your content, otherwise you could generate extremely large logs!

  • The number of Dependencies was large (at a program level we were tracking over 3000), but the biggest surprise was where a particular service had learned that they had become dependencies for other services without their knowledge (sometimes indirectly, by inclusion into a service that was then incorporated into another service). Without a group wide view this would have been very hard to identify and manage for each of the individual teams.

  • It is natural that people overlook risks or impacts outside their area of responsibility. Use techniques like our ‘Standard Risks’ (see above) to help teams cover all their risk points.

  • HTTPS adoption is a great opportunity to do a number of things like refactor your sites code, introduce stricter controls for publishing content, review and purge legacy content, enable HTTP/2, &c – use the opportunity!

  • Many websites had dependencies on external services that had not adopted HTTPS. Rather than removing these services we tried to work with the companies that provide these so that their content / service was also updated. Sometimes this involved several months of negotiations. This cannot be planned for, but be prepared that you may need to do this. Overall this was a very positive experience – working collaboratively with our partners and helping them to improve their services for mutual benefit.

  • With any big program 80% of the work of the program team is communication. With so many stakeholders and different styles of working the same message needs delivering in many different formats again and again. We used a mix of:

    • Formal reports

    • Newsletters

    • Online dashboards

    • Enterprise social network

    • Ad hoc email

    • Topic specific workshops

    • Information sharing meetings

    • Presentations at regular internal forums like tech conferences or seminars

    • Presentations at leadership meetings

But remember that the communications need to be targeted by audience! They language and level of detail suitable for communicating with a front end web developer will probably not be suitable for communicating with business people or management. We had to regularly check the effectiveness of our communications and refine our approach to get our messages understood.