Google Cloud Summit Chicago 2017 – Cloud Spanner Database

Spanner is Google’s new DBaaS offering.  Google touts it as having the best features of both a relational database and a document or NoSQL database.  So, why did Google create Spanner?  Google needed:

  • Horizontally scale
  • ACID transactions
  • No downtime
  • Automatic sharding
  • Seamless replication

Cloud Spanner is Google’s mission-critical relational database service.  It was originally built to run Google Adwords internally, but is now exposed as a public service.  It is a multi-regional database and can span millions of nodes.

Open Standards

  1. Standard SQL (ANSI 2011)
  2. Encryption, Audit logging, IAM
  3. Client libraries (Java, Python, Go, Node.js)
  4. JDBC driver

Architecture

Spanner is provisioned in instances.  The instances exist in different zones.  This architecture allows for high availability.  The customer can choose which regions the database instances are placed in.  Writes to the database are synchronous and are replicated across nodes.

Spanner supports an interleave data layout.  This specifies that data should be written in close proximity on disk.  The result is much better read performance.  Spanner is designed for large amounts of data.  It is not as efficient for small data sets.

Advertisements

Google Cloud Summit 2017 Keynote

This is the first Google Cloud Summit in Chicago and it will be in some ways a coming out event for Google Cloud.  The keynote speaker is Scott McIntyre, Director Google Cloud.  There have been 500+ releases in the past 6 months and 6.5 million businesses use GCP today.  In addition to the customer base Google has build an ecosystem of business partners for their Cloud Platform.  Google believes that all companies either are or will become data companies.  The transportation industry provides a dramatic example of this.

The advantages of Google Cloud include optimizing business operations by hosting business infrastructure services in the cloud.  Collaboration is another area that Google touts as an advantage of their cloud services.  The acceleration of business is also one of their advantages.  They follow a philosophy of openness, which goes beyond just open source software.

The focus of the Summit is to promote the Google Cloud Platform to the Chicago and larger Midwest business community.

Ulku Rowe, Technical Director, Financial Services Office

The 5 Drivers of cloud decision are:

  • Reliability
  • Security
  • Excellent Support
  • Performance
  • Cost Effectiveness

Serve over 1 billion end users today.  40% of the world’s internet traffic goes through Google’s network.

Google also provides a private, ultra-fast backbone, which it touts as more secure than many public access points.  Google provides a “Layered Defense in Depth Security”.  They follow a security paradigm of least trust.  Google introduced the Titan chip, which is used to secure hardware.

Security innovation

  • Identity-aware proxy – application level security, which has more granular security than a corporate VPN
  • Data loss prevention API

The security layer is built into every layer including:

  • GCP
  • G Suite
  • Chrome and Android – scan 6 billion mobile apps to make sure they are not infected with malware

Roberto Bayardo, Distinguished Software Engineer

Worked on machine learning within Google.  There is a myth of breakthroughs is that they happen in isolation.  With G Suite, Google is dedicated to saving businesses time via collaboration.  Files in G Suite act as conversations.  Multiple people can collaborate on a single document in real time.

Smart reply in Gmail uses machine learning to automatically reply to emails based upon the user’s observed behavior.  A demo was performed in which Google sheets used natural language processing to understand a question a user asked and provide an answer in real time.

 

Andrew Lewis, Whirlpool, Senior Manager

Whirlpool went live with G Suite about 3 or 4 years ago and that was their first foray into the cloud.  They built a team called “winning workplace” made of several department representatives.  Google Hangouts was one technology that made a huge difference at Whirlpool.  Before the use of Hangouts, video conference capabilities were limited.  Using the Google Cloud has changed the business conversation at Whirlpool.

Scott McIntyre – Summary

Scott McIntyre talked about the tools make available on Google Cloud Platform.  The App Maker allows developers to write an app once and have it work across multiple platforms.  Cloudsearch is a tool which allows users to search for content across all of their G Suite content.  G Suite uses simple controls to allow for easy administration of the tools.

Miles Ward, Director, Solutions Google Cloud

There has been an acceleration of businesses trying to use the cloud to increase productivity.  Google has worked to democratize data within organizations to increase collaboration and innovation.  Google develops tools and platforms to facilitate this and open sources much of this technology.

Cloud Spanner

This is the first horizontally scalable relational database.  Cost is calculated in real time as the database architecture is specified.  Databases can be deployed within seconds.  The database appears to be lightning fast.  A query was done on several terabytes of data  within less than a second.

Cloud AI

Google provides large vendor free datasets that companies can use.  Google also provides machine learning training to allow customers to come up to speed rapidly.  Google provides pre-trained models or allows customers to train their own models.  Google has provided the open source TensorFlow machine learning platform.

Demo

Digital Intelligence API – can analyze a video and provide context and relevance data on the video.  The API can also scan a video catalog and retrieve relevant video content.

The platform that Google provides allows developers to focus on just designing and developing code rather than getting distracted by provisioning and managing infrastructure.

Bradley Burke, CTO, Network Insights

They have a large streaming analytics platform as well as a large data platform.  The platform is enabled by Google Cloud Platform.  The Google Cloud Platform allowed the organization to scale their business.  Their analysts can now write queries against Big Query and has brought data science to the masses.

Kris Baritt – Technical Director, Office of the CTO

Google spends significant resources on the partnerships it has with its customers.  One goal of the relationship is “getting out of the software jail”.  There should be a shared success model.  The partnership should be a commitment, not a mandatory sentence.  There should be flexible deployment models and flexible support models.

There are 10+ years of open source projects in:

  • Linux
  • Python
  • C++
  • Git
  • Kubernetes – tool to manage containers

Google does not dictate VM configurations.  Customers can configure their VM’s on the Google Cloud.  Google has per second billing.

Google announced “committed-use discounts” yesterday.  On average customers save 60% savings and 0$ startup costs vs. on-premise infrastructure.

Ratnakar Lavu, Kohl’s, CTO

Kohls is a brick and mortar retailer with a growing digital presence.  They have two data centers with rack-and-stack servers.  They are moving to the cloud in order to scale for peak periods such as the holiday season.  Kohl’s selected Google Cloud Platform because their platform is secure, it is flexible, has low latency, and Google is innovative.  The machine learning platform was attractive to Kohl’s.

Kohl’s developers can now spin up servers on the fly, although this has increased costs, so Koh’s is trying to manage that with Google.

Scott McIntyre – Closing

Scott reiterated the primary themes that were highlighted in the keynote.  Overall this was a solid keynote for a technical crowd.

MongoDB World 2017: Building Micro-Services Based ERP System

Jerry M Reghumadh of Capiot did a talk on building micro-services from the ground up.  The legacy system that his group replaced was monolithic and rigid.  The solution the Capiot team proposed to the client placed each component of the ERP system into its own atomic component.  Everything on the platform that was built was an API.  These API’s were very “chatty”.

The engineering decisions that were made included the choice of NodeJS and MongoDB as the base technologies for this platform.  NodeJS was selected in part, because of its small footprint.  This lowered the barrier to entry for the application.  Java was considered, but it was too heavy for the needs of the project.  MongoDB was selected for the data persistence layer because it saves data as documents and it did not require the marshaling and unmarshaling of data.  MongoDB also allowed the implementation team to use a flexible schema.  MongoDB offered greater ease of clustering and sharding versus other available options for this project.  This allowed the developers to implement this without relying on a dedicated database administrator.

The technology stack included:

  • NodeJS
  • ExpressJS
  • Swagger.io
  • Mongoose.js
  • Passport.js

The team implemented a governance model that forced any exposed API to be exposed in Swagger.  This prevented the proliferation of “rogue” API’s.  Any API not exposed in swagger would not work properly in the system.  Mongoose allowed the team to enforce a schema.

MongoDB World 2017: Using R for Advanced Analytics with MongoDB

Jane Uyvova gave a talk on analytics using MongoDB and the R statistical programming.  She began her talk by discussing analytics versus data insight.  R has become a standard for analyzing data due to its open source nature and easy licensing requirements versus some legacy tools, such as SAS or SPSS.

Use Cases

  • Churn Analysis
  • Fraud Detection
  • Sentiment Analysis
  • Genomics

Use Case 1: Genomics

The human genome consists of billions of gene pairs.  The dataset that was used came from HAPMAP.

  • HapMap3 was the dataset
  • Bioconductor was the R library that was used for this analysis
  • R-Studio was used for the analysis
  • MongoLite connector

The MongoDB data aggregation framework was used to aggregate the data by region.

In doing genomic analysis, schema design becomes important in making the analysis easier and more effective.

Use Case 2:  Vehicle Situational Awareness

  • Chicago open data was used as the dataset
  • The dataset was loaded into MongoDB and Compass was used for the initial analysis
  • R was used to analyze the data.  R was used to extract data for a density plot (GG-Plot)
  • The MongoDB flexible schema allows a wide variety of data to be included in the analysis

One issue that must be addressed is scalability.  Since R is a single-threaded application, data scientists come up against data volume constraints.  One solution to this is to use Spark to parallelize and scale R.

A MongoDB/Spark architecture can include an operational component.  This operational component consists of an application cube and a MongoDB driver.  The data management component consists of the MongoDB cluster.

MongoDB: Migrating from MongoDB on EC2 to Atlas

Atlas was introduced by MongoDB as their SAAS offering for MongoDB.  Atlas allows administrators, developers, and managers to deploy a complete MongoDB cluster in a matter of minutes.  Some basic requirements for using Atlas include:

  • Atlas requires SSL
  • Set up AWS VPC peering
  • VPN and Security Setup
  • Use Amazon DNS (when on AWS)

The preparation work that must be done includes:

  • Picking a network CIDR that won’t collide with your networks
  • Need to use MongoDB 3.x engine using WiredTiger
  • Test on replicas using testing/staging environments

Atlas supports the live migration of data from an EC2 instance.  The Mongo Migration Tool or MongoMirror can be used to migrate the data.

MongoDB World 2017: Video Games

Jane McGonigal’s keynote talk started by highlighting that 2.1 billion people around the world play video games.  12 billion hours a week are spent playing video games.  There is something that is energizing about gameplay.  A scan of the brain has shown that the opposite of play may be depression.

Gameplay seems to increase activity in the hippocampus portion of the brain.  One interesting fact is that gamers fail at gameplay 80% of the time.  In spite of this, gamers tend to activate the ability to learn.

Jane does game research and analyzed the psychology of gameplay.  Pokemon Go was fastest downloaded app  in the history of apps.  There were 500 million downloads in 30 days.  Why was this game so popular?  Pokemon go elicits a sense of opportunity for the players and engages users.  When 650 million people start walking around playing this mobile game, a lot of data is generated.  There have been analyses done around the statistics around the usage of this game.

Augmented Reality may be the most compelling experience and platform for gaming vs. virtual reality.  The lessons learned from gaming data are:

  1. People want to engage the real world in an interesting way
  2. Games will be a huge driver of data collection

The app called Priori listens to a person’s voice to determine a person’s mental state.  Emotionant is a technology that determines a person’s mood based on their facial expression.  Emotiv is a sensing device that can detect emotions.  Mooditood is a social network to share how people feel.

MongoDB: BI Connector & Tableau

 

Ronan Bohan from MongoDB and Vaidy Krishnan from Tableau presented the Jumpstart presentation.  The BI connector was just shipped by MongoDB.  The mission of Tableau is to harness the power of data.  There are three core tenets of the development philosophy are:

  1. Connectivity to access all data
  2. Design software for deeper thinking
  3. Ability to scale data and provide analytics on that data

There is a focus on Big Data.  The goal is to provide focus and make the data discoverable.  Vaidy stated that Tableau is about data access.  Tableau will transfer the data from the application layer to the business actionable level.  Tableau has been adopted by 55,000 clients all over the world.

Ronan began a discussion of how to connect MongoDB data to the Tableau platform.  Tableau was originally designed to work with structured data from relational databases.  MongoDB is designed to store semi-structured data.  This made MongoDB and Tableau incompatible.  This problem has been solved by the BI connector.  This allows MongoDB’s semi-structured data to connect to the Tableau visualization platform.

Ronan provided the audience with a demo using the BI Connector to link the MongoDB data to the Tableau visualization tool.  He displayed a DRDL Definition file which is a YAML  file that is used to map the MongoDB data to a schema that applications that consume structured data can understand.

Detailed information on the MongoDB BI Connector can found on the MongoDB website. (https://www.mongodb.com/products/bi-connector)

Some use cases for using Tableau with MongoDB includes ad hoc analysis.

The future of the MongoDB BI Connector will include additional PushDown capabilities, improved authentication, and centralized management tools.