C2C Monthly Recap: June 2022
- C2C News
Browse articles, resources, and the latest product updates.
Machine Learning is an essential component of every major tech product today. With tools like BigQuery ML, you don’t have to be a data scientist to quickly and easily incorporate ML into your applications.At a recent C2C Deep Dive event hosted by the Google Cloud startups team, Google Cloud AI/ML Specialist Customer Engineers Mike Walker and Rob Vogelbacher explained how you can use BigQuery ML to power insights for you and your customers. There are many built-in algorithms for regression, classification, clustering, forecasting, and recommendations that you can train with just a few lines of SQL. All these help you learn more from your data in a short time and in a cost-effective way. The models you build can be called from BigQuery or from external applications.The recording from this session includes the following topics:(0:00) Introduction from C2C (2:35) What is BigQuery? (6:00) Decoupled storage and compute on BigQuery (8:00) Typical ML Workflow (10:00) BigQuery ML and AI (11:30) BigQuery ML-supported models and features (17:30) BigQuery Use cases (18:30) BigQuery Explainable AI (21:05) AutoML Tables and BigQuery ML (23:25) BigQuery ML Example Models: Miami Housing Dataset (41:30) Audience Q&AWatch the full recording of the conversation below: Extra Credit:
Machine Learning is an important component of every major tech product today. However, not everything beyond excel sheets is big data, and not all big data problems require ML. The most important function of ML should be to supplement the product.Decision makers in the ML and big data spaces should know how an ML mindset differs from a traditional software development mindset. Hear from startup mentor, program manager, and trained architect KC Ayyagari (@kcayyagari), Senior Customer Engineer at Google Cloud.The recording from this session includes the topics listed below, plus plenty of conversation infused in the presentation from open Q&A from community members present at the live event:(0:00) Welcome and introduction from C2C and the Google Startups Team (3:30) Agenda overview (5:00) What is Machine Learning? (16:55) How ML is different from normal software development and how to represent physical problems in data (42:30) The do’s, don’ts, and focus areas in the ML mindset for managersWatch the full recording below: Preview What's NextJoin the Google Cloud Startups group to stay connected on events like this one, plus others we have coming up:
Druva Reddy, a Solutions Architect specializing in ML at Google Cloud, discussed Vertex AI, which brings all of Google Cloud’s ML services together under one unified UI and API. In Vertex AI, you can now easily train and compare models using AutoML or custom code training and store all of your models in one central model repository. In this overview session, Druva covered some major components of the Vertex AI platform, from training to prediction to MLOps services. This recording also includes a demo of an end-to-end example that shows these services in action.Review all parts of the presentation, including:(00:00) Introduction to Google Cloud Startups team (05:05) Introduction to functional solutions with AI (10:15) ML on GCP with Vertex AI What’s included in Vertex AI Choosing the right tools or pre-trained models Low/No code (25:55) Operationalizing ML MLOps, life cycle, and framework Using Vertex AI with MLOps (32:55) Vertex AI demo (44:05) Open community questions Extra credit: Google Cloud Vertex AI Docs Get started in Cloud Console Best practices for implementing machine learning on Google Cloud To connect with Druva, reach out to him directly in the Google Cloud Startups community and tag @Druva Reddy
On Tuesday, March 8, also known as International Women’s Day, C2C France Team Leads @antoine.castex and @guillaume blaquiere were excited to welcome Google Lead Developer Advocate @Priyanka Vergadia to host a powerful session for the Google Cloud space in France and beyond. These sessions intend to bring together a community of cloud experts and customers to connect, learn, and shape the future of cloud. At this C2C Connect event, Vergadia led a broad and enthusiastic discussion about Vertex AI and the MLOps pipeline. 60 Minutes Summed Up in 60 Seconds ML and AI are the cornerstone technologies of any company that wants to leverage its data value. ML can be used across different platforms, including Google Cloud. BigQuery ML is a key example of serverless ML training and serving. Vertex AI is the primary end-to-end AI product on Google Cloud and interacts with many other Google Cloud products. Low-code and no-code users can reuse pre-trained Vertex AI models and customize them to fit their business use cases. It’s perfect for beginner and no-ML engineer profiles. Advanced users can leverage Vertex AI’s managed Jupyter Notebook to discover, analyze, and build their models. Vertex AI also allows users to train models at scale, to deploy serverless models, and to monitor drift and performance. As Vergadia reminded the audience, ML engineering makes up only 5% of the effort that goes into the ML workflow. The upstream steps (data cleaning, discovery, feature engineering preparation) and the downstream steps (monitoring, retraining, deployment, hyperparameter tuning) must be optimized to save time, effort, and money. To this end, VertexAI supports a pipeline definition, based on the TFX or Kube Flow pipelines, to automate the end-to-end tasks around ML engineering. This pipeline is called MLOps. Watch the full recording of the session below: Despite its 60-minute time limit, this conversation didn’t stop. VertexAI is a hot topic, and it certainly kept everyone’s attention. The group spent time discussing data warehouses, data analytics, and data lakes, focusing on products like BigQuery, Datastudio, and Cloud Storage. Attendees also offered their own feedback on the content of the session. For example, halfway through the presentation, Soumo Chakraborty asked how users can integrate ML pipelines in a CI/CD pipeline, and pipeline integration became a focal point of the remainder of the discussion. Preview What's Next These upcoming C2C events will cover other major topics of interest that didn’t make it to the discussion floor this time around: Make the Cloud Smarter, April 12, 2022 Looker In the Real World with Looker PM Leigha Jarett, May 10, 2022 (In-person event in Paris) If these are topics you’re eager to explore at future events, be sure to sign up to our platform! Extra Credit Looking for more Google Cloud products news and resources? We got you. The following links were shared with attendees and are now available to you: VertexAI BigQueryML C2C Events
The Bank of England (BoE), the world’s oldest central bank, is one of the most visible and high-profile investors in innovation. Over the last decade, it has developed its own innovation lab, with projects including The Bank of England Accelerator, Her Majesty’s Regulatory Innovation Plan and The Regulatory Sandbox. It introduced a RegTech cognitive search engine and uses artificial intelligence (AI) technologies for chatbots and predictive real-time insights. More recently, the Bank made headlines with its plans for a “digital pound” on the blockchain, called Britcoin, which will use AI in its executable smart contracts. Cognitive search engine The BoE employs a Switzerland-produced cognitive search engine as their company search solution. The tool uses AI and ML to gather data from multiple sources and deliver real-time relevant responses to users’ questions. The Bank also embeds it in its CRM to improve client conversations and reduce meeting preparation times. Users find answers to their questions up to 90% faster than they would with a manual search. This tool not only boosts productivity and improves client trust but also makes it easier and simpler for the Bank to comply with ever-changing regulations. Chatbots Chatbots the BoE uses for various services include: Functional chatbots that help customers with routine questions, such as directing callers to the closest ATMs to their locations. More sophisticated AI conversational assistants that feed customers investment recommendations and real-time market-related news, among other industry-related data. Chatbots using a combination of predictive analytics and prescriptive analytics to give decision-makers at the BoE real-time insights. Examples include helping BoE executives gauge their biggest competitors in the micro-lending space and helping them determine which customer segment they should target for their advertising for a new mobile app. Britcoin Bitcoin is the Bank of England's plan for a digital currency acceptable by retailers and other companies in lieu of debit and credit cards. Owners would have limits on how much Britcoin they could hold initially, but conversion to sterling and its transactions would take minutes. Unlike most cryptocurrencies, Britcoin will be a stablecoin, meaning it will tether itself to UK currency to avoid the problems of crypto fluctuations. Supporters appreciate that Britcoin would use AI-enabled smart contracts to execute DeFi transactions that are cheaper, faster, and more transparent than online payments and money transfers. Critics fear the innovation could lead to financial instability, along with higher loans and mortgage rates, among other problems. To resolve these issues, a task force has been assembled to report on the merits of the CBDC (Central Bank Digital Currency) by the end of this year. Why the Bank is interested in AI In her 2021 keynote address at the FinTech and InsurTech Live event on how the Bank of England uses AI, Tangy Morgan, an independent BoE advisor, described how the Bank conducted a survey assessing how banks headquartered or operated in Britain have used machine learning and data science during Covid-19, and how the BoE can profit from that report. The BoE found that the use of AI was growing at an exponential pace and could benefit the Bank in various ways. Possible applications of AI in this context include: Money laundering prevention AI to identify patterns of suspicious behavior and curb AML. Underwriting and pricing applications, where big data analytics scrutinizes customers’ risk profiles, tailoring premiums to match individual risks. Credit card fraud detection, whereby AI analyzes large numbers of transactions to detect fraud in real-time The Bank of England asservates that “developments in fintech … support our mission to promote the good of the people of the UK by maintaining monetary and financial stability.” Are you based in the UK? What do these uses of AI bring to mind for you? Write us on our platform and let us know.
The Startups Roundtable series hosted by C2C and Google Cloud Startups continued on Tuesday, Jan. 25 with another session on AI and ML, this one devoted solely to technical questions. These roundtable discussions are designed for startup founders seeking technical and business support as they realize their visions for their products on the Google Cloud Platform. This time, 10 Googlers including 6 Customer Engineers led private discussions in small groups of over forty guests from the C2C community. Watch the introduction to the event below:As in the previous Startups Roundtable, after the introduction, the hosts assigned the attendees to breakout rooms where they could ask their questions freely with the attention of the Google staff on the call. The breakout rooms in these sessions are not recorded, but C2C Community Manager Alfons Muñoz (@Alfons) joined one of the conversations to gather insights for the community. In this breakout room, Google Customer Engineer Druva Reddy (@Druva Reddy) explained how to understand the value proposition the startup is giving and how users will interact with the business. Reddy advised guests to focus on having a vision of the market and to build a product with a high level of abstraction, rather than focusing simply on the data-specific tools they are going to use.According to Muñoz, after the time allotted for the discussions in the breakout rooms ended, the conversations kept going. Guests had more questions to ask and more answers to hear from the Google team. The hosts invited all attendees to bring their questions to the C2C platform for the Googlers to answer after the event. Two guests took them up on the offer, and Reddy wrote them both back with detailed advice.Markus Koy (@MarkusK) of thefluent.me wrote:Hi everyone,I am using the word-level confidence feature of the Speech-to-Text API in my app (POC) https://thefluent.me that helps users improve their pronunciation skills. Is there an ETA when this feature will be rolled-out for production applications and if so, for which languages?@osmondng, @Druva Reddy thank you for offering to reach out to the Speech API team.Markusand Reddy wrote back:Hi Markusk,It was great chatting with you!!The Product team is aiming for Word Level Confidence General Availability stage (GA) by end of Q2 2022. Regarding languages supported, currently it supports English, French and Portuguese and that being said, multiple languages will be supported as we rollout the support for other languages in phases.Please stay tuned and checkout announcements here- https://cloud.google.com/speech-to-text/docs/languages.Thanks,Druva ReddyThe next day, Erin Karam (@ekaram) of Mezo wrote:Hello,We are looking for guidance with training our DialogFlow CX intent. Our model is limited by the 2000 limit on training phrases for a single intent. Our use case is that we are attempting to recognize symptoms from the user. We have 26 different symptoms we are trying to recognize. We have 10s of thousands of rows of training data to train for these 26 symptoms. The upper limit of 2000 is hampering our end performance. Please advise. Erinand Reddy responded:Hi Ekaram,Thanks for joining today’s session!!Default limit is 2000 training phrases per intent. This amount should be enough to describe all possible language variations. Having more phrases may make the agent performance slower. You can try to filter out identical phrases or phrases with identical structure.You don't have to define every possible example, because Dialogflow's built-in machine learning expands on your list with other, similar phrases.However, create at least 10 to 20 training phrases so your agent can recognize a variety of end user expressions.Some of the best practices i would suggest is,Avoid using similar training phrases in different intents. Avoid Special characters. Do not ignore agent validation.Let me know if that works.A startup is a journey, and no startup founder will be able to get all the answers they need in one session. That’s why the Startups Roundtable series is ongoing; more business and technical roundtables will be coming soon. For now, if you are a startup founder looking for more opportunities to learn from the Google Startups Team and connect with other startup founders in the C2C community, register for these events for our startups group:
Use cases for artificial intelligence (AI) are so many and varied that the meaning of the term itself can be hard to pinpoint. The Google Cloud Platform supports a host of products that make specific AI functions easy to apply to the problems they’re designed to solve. Vision AI is a cloud-based application designed to make computer vision applicable in a wide variety of cases. But what is computer vision, exactly?On December 8, 2021, C2C invited Eric Clark of foundational C2C partner and 2020 Google Cloud Partner of the Year SpringML to answer this question. Clark’s presentation, a C2C Deep Dive, offered an enriching explication of the concept of computer vision, as well as projections for its impact on the future of AI. Most notably, Clark used Vision AI to present multiple demonstrations of computer vision in action.To set the stage for these real-world applications, Clark offered a breakdown of the essential functions of computer vision:Next, Clark used real footage of traffic at a busy intersection to demonstrate how computer vision monitors this footage for incidents and accidents to calculate travel times:To showcase Vision AI’s video intelligence capabilities, Clark uploaded a video and applied different tags to demonstrate how computer vision recognizes and identifies individual elements of different images.Clark’s final demonstration was an in-depth look at several infrastructure maintenance use cases, starting with a look at how computer vision can be used to detect potholes and other impediments to safe road conditions:Clark’s demonstrations made clear that Vision AI is as user-friendly as it is powerful, and Clark made sure at the end of his presentation to invite attendees to make a trial account on the Google Cloud Platform and try out the API themselves. Alfons Muñoz (@Alfons), C2C’s North American Community Manager, echoed his encouragement. “It’s really easy to try it out,” he said.If you haven’t already, set up an account on the Google Cloud Platform and try using Vision AI for help with a current project, or even just for fun. Write us back in the community to let us know how it goes!
Machine Learning (ML) is a major solution business and technical leaders can use to drive innovation and meet operational challenges. For managers pursuing specific organizational goals, ML is not just a tool: it’s a mindset. C2C’s community members and partners are dynamic thinkers; choosing the right products for their major projects requires balancing concrete goals with the flexibility to ask questions and adapt. With these considerations in mind, C2C recently invited Google Cloud Customer Engineer KC Ayyagari to host a C2C Deep Dive on The ML Mindset for Managers.Ayyagari started the session by asking attendees to switch on their cameras and then ran a sentiment analysis of their faces in Vision API:After giving some background on basic linguistic principles of ML, Ayyagari demonstrated an AI trained to play Atari Breakout via neural networks and deep reinforcement learning:To demonstrate how mapping applications can use ML to rank locations according to customer priority, Ayyagari asked the attendees for considerations they might take into account when deciding between multiple nearby coffee shops to visit:As a lead-in to his talking points about the ML mindset for managers, Ayyagari asked attendees for reasons they would choose to invest in a hypothetical startup he founded versus one founded by Google’s Madison Jenkins. He used the responses as a segue into framing the ML mindset in the terms of the scientific method. Startup management should start with a research goal, he explained, and ML products and functions should be means to testing that hypothesis and generating insights to confirm it:Before outlining a case study of using ML to predict weather patterns, Ayyagari asked attendees what kinds of data would be necessary to use ML to chart flight paths based on safe weather. Guest Jan Strzeiecki offered an anecdote about the flight planning modus operandi of different airports. Ayyagari provided a unique answer: analyzing cloud types based on those associated with dangerous weather events.The theme of Ayyagari’s presentation was thinking actively about ML: in every segment, he brought attendees out of their comfort zones to get them to brainstorm, just like an ML engineer will prompt it’s machines to synthesize new data and learn new lessons. ML is a mindset for this simple reason: machines learn just like we do, so in order to use them to meet our goals, we have to think and learn along with them.Are you a manager at an organization building or training new ML models? Do any of the best practices Ayyagari brought up resonate with you? Drop us a line and let us know! Extra Credit:
Using computer vision to collect and analyze image data is one of the most useful and in-demand ways to apply AI to cloud technology. In this C2C Deep Dive, Eric Clark of SpringML, a foundational C2C partner and 2020 Google Cloud Partner of the Year, offers an overview and demonstration of Google’s Vision AI, one of the most dynamic Google Cloud products SpringML and others in its space are using to build customized smart AI solutions.The recording from this session includes:(1:00) Speaker introduction and agenda overview (3:40) SpringML at a glance (4:45) What is computer vision? (7:30) Computer vision in different industries (14:00) Structured and unstructured data (16:00) Vision AI options (18:30) Video intelligence demonstration (23:00) Infrastructure maintenance computer vision use cases (32:30) Future of the computer vision market (36:30) Next stepsWant to discuss computer vision at an upcoming interactive event? Join us on January 25 for a Google Cloud Startups AI and ML roundtable:
ML is an important component of every major tech product today. However, not everything beyond Excel sheets is big data, and not all big data problems require ML. The most important function of ML should be to supplement the product.KC Ayyagari, Senior Customer Engineer, San Francisco Enterprise Sales, Google Cloud, presented this C2C Deep Dive.The recording from this session includes:(1:50) Introduction to speaker and agenda overview (3:45) Live demo of Google Cloud’s Vision API and sentiment analysis (7:20) Introduction to machine learning and its relation to AI and deep learning (8:30) Teaching ML languages by using examples and reducing errors (13:00) Example: using deep reinforcement learning for a neural network to play Atari (17:40) Comparing ML to normal software development and representing physical problems in data (28:35) The dos and don’ts of ML for managers (40:45) Case study: predicting weather trends and flight plans with images of clouds (47:50) Building a prototype with dataset search engine (51:50) Keys to successful machine learning Extra Credit: Dataset search engine Looking for more? Join C2C and the Google Cloud Startup team on January 25, 2022. Sign up below:
On Tuesday, November 16, 2021, C2C hosted its first Google Cloud Startup roundtable event. This series, organized and planned specifically for representatives from startups looking to grow their businesses, brings these representatives together with Google Cloud Customer Engineers, Technical Specialists, and Startup Success Managers to lead discussions and answer questions on hot topics in the startup space. The first roundtable included group sessions for business leaders and technical staff as well as a Customer Engineer AMA, all exploring artificial intelligence (AI) and machine learning (ML), and the potential uses of each for startup businesses as they form and begin to scale.After welcoming guests and introducing the Google staffers on the call, the event’s organizers invited attendees to join breakout rooms based on whether they had come with technical or business questions to discuss. These breakout rooms were not recorded, but C2C North America Community Manager Alfons Muñoz joined the technical discussion.In this breakout room, startup founders from 86 Repair and Auralab brought their questions directly to Google’s customer engineers. According to Muñoz, “They were stating their problems or projects and getting an overview of how to approach these problems...and they had more than one overview, because we had more than one customer engineer, so they had more than one point of view. They also were encouraged to get in the community.”Most of this event’s ninety minutes were spent in the breakout rooms, but after about an hour, the groups came together again for an AMA with all of the customer engineers on the call. In this session, the visiting startup founders revisited the topic that had dominated the conversations in the breakout rooms: data. In order to use ML effectively, an organization needs a platform that can store, host, and manage data reliably.Google’s Deok Filho offered a canny on-the-spot breakdown of the relative advantages and disadvantages of integrating different Google and third-party data management tools with BigQuery, bringing in Mike Walker to field follow-up questions from Ben Collins of Auralab and Daniel Zivkovic, founder and curator of Serverless Toronto, along the way. Check out a clip of the conversation below:According to Muñoz, in terms of connecting guests to the right Google staffers and getting their questions answered, this event was a success, but, in his words, “it’s important to note that this is the first of many roundtables.” Look for more of these events for startup founders in 2022, including the next AI and ML roundtable in January:
Teaching a machine model to think is one of the most challenging—and rewarding—tasks technology can accomplish. When you want your model to recognize images, you simply convert them into numbers, or vectorize them, in a process called “feature extraction” or “feature encoding.” For example, you may want to encode the image of a cat into the following vectors:The curved shape of the ear  Color of the iris, red  Paws, grey But how do you train the model to recognize text? After all, text data is abstract; it’s composed of words with various conceptual referents. That’s where the bag-of-words (BoW) model comes in. Using this model, you place your words into one or more “bags,” or multiple sets, and vectorize them on a spreadsheet. This helps you classify documents, calculate probability, detect spam, and more. Read on to learn how the BoW model solves a series of common but critical problems. Natural Language Processing: Understanding Text What if I am working on an application with document-scanning capabilities and I want it to do more than just recognize text? I want to teach my ML model to understand one or more sentences. I can teach my algorithm how to convert images into binary form, but how am I going to train it on abstract text?SolutionI convert the text data into binary metrics on a spreadsheet, just as I would with vectorized images.ExampleSentence: “I like to go to the movies.”The keywords tell my ML-trained model how to understand the gist of a sentence. In this case, the sentence theme is Like; Movies. I flip those keywords into binary metrics, thus: Like ; movies . The other words (I, to, and the) are subordinate to the keywords, so I map them on my spreadsheet as 0s: .Now that I’ve trained my model to identify the theme in the sentence, it can proceed to do the heavy lifting, which is what it’s best at. In other words, my model, now trained through BoW, can predict, analyze, categorize, and so forth. Document Classification I want my application to facilitate better sorting and organization of scanned documents. I need to train my model to tell me how many times certain keywords appear in certain sentences. How can the BoW model help?ExampleSentence 1: “I like to go to the movies.”Sentence 2: “I do not like movies like this.”Each of these sentences itself is a BoW, since each is made up of a unique set of words. To determine how many times each word in the first sentence appears, I first tabulate the frequency of the words in each BoW: BoW (1) BoW (2) I 1 1 Like 1 2 To 2 0 Go 1 0 The 1 0 Movies 1 1 Then, I can count the total number of words by adding both columns. For instance, the word “movies” appears twice in our combined bag of words. Information Retrieval I want to be able to search scanned documents for particular text data. To do so, I need to know whether certain words appear in more than one sentence. Here’s where I use the “both” feature.ExampleBoW 1: “I like to go to the movies.”BoW 2: “I do not like movies like this.”I can still use the same table, but this time I’ll add a column to keep track of which words appear in both sentences: BoW (1) BoW (2) Both I 1 1 1 Like 1 2 1 To 2 0 0 Go 1 0 0 The 1 0 0 Movies 1 1 1 Unlike the words I, like, and movies—which appear in both sentences—the words to, go, and the only appear in BoW (1). Thus, I tag the first set of words (1) and vectorize the second set of words as (0). Scoring the Importance of Certain Terms When I refer back to my scanned documents, I want to be able to keep track of which information is most critical. Therefore, I want my model to score the frequency of certain key terms in the document as a whole.ExampleBoW 1: “I like to go to the movies.”BoW 2: “I do not like movies like this.”The BoW model is equipped with a specific feature that enables this kind of scoring: the term frequency-inverse document frequency (TFIDF) feature: BoW (1) BoW (2) TFIDF (1) TFIDF (2) I 1 1 Like 1 2 To 2 0 2/7 0 Go 1 0 The 1 0 Movies 1 1 By comparing the frequency with which each word appears in each sentence to the number of words in the same sentence, the TFIDF feature scores each word by frequency per sentence. The word to appears 2 times out of the 7 total words in the first sentence. It appears 0 times in the second sentence. Probability Finally, I want to make sure my scanned documents are coming from a trustworthy source. The bag-of-words method is frequently used for spam detection.ExampleTake these two phrases, which could easily pass as email subject lines:“Send money to me through PayPal” “Get rich today”One is legitimate while the other is spam. How can I train my model to know which to delete? First, I use Bayes’ theorem of probability:P(L/S) vs. P(S/S)L= Legitimate; S= SpamThis theorem determines how likely a word is to be used in a spam email.Then, I categorize either word string into keywords, assigning each string a matched probability. For example: PayPal=legitimate, given the probability of 0%. The words Money, Get, Rich, and Today are each weighted 10%.Finally, I multiply spam word frequencies to get my results: Legit “Send money to me through PayPal” 10% Spam “Get rich today” 30% As a result, I train my ML model to conclude that sentences like BoW(2) are highly likely to be spam. Other Uses The examples above describe a series of use cases for the BoW model, but there are others, too. Here are a few more potential uses for the BoW model:Sentiment analysis, also known as opinion mining, in which online text (such as social content), is mined to evaluate the writer’s attitude. Language modeling, to determine the probability of a given sequence of words occurring in a particular string of words. Computer vision, in which particular images are given the BoW treatment. In this case, the method is called the bag-of-visual-words model. Flaws In some contexts, using the bag-of-words model can introduce unintended problems. Watch out for these potential issues when using this model: Certain documents or input data may be too sophisticated, complex, or overly large for the limited BoW model. Too few significant words and too many words with no objective or practical meaning may result in too many null values, rendering the vectorization useless. If you forget a hyphen between words, one word, such as “home-run,” could be split into “home” and “run,” and then scored higher than it deserves, skewing results. Misspellings—such as “tank” instead of “thank,” or “gr8” for “great,” distort algorithmic results. BoW ignores linguistic nuances and context, so certain words or word strings could be scored higher than they deserve. This could be remedied with transformer-based deep learning models like Bidirectional Encoder Representations (BERT), which use neural networks to better discern the context of words in search queries.Many of these issues could also be remedied with Google Natural Language API, which applies natural-language understanding (NLU), or natural-language interpretation (NLI), to help computers understand and respond to humans in our own language.Have you ever worked with the BoW model? Would the BoW model be useful for any projects in your ML workflow? Reach out and let us know what you’re thinking. Extra Credit:
Michael Pytel (@mpytel), co-founder and CTO at Fulfilld, shares stories from the team’s wins and losses in building out this intelligent managed warehouse solution.The recording from this Deep Dive includes:(2:00) Introduction to Fulfilld (10:15) Natural Language Processing use case for warehouse guidance (11:40) Generating directions using Dijkstra’s algorithm (commonly used in mapping applications) to connect the shortest route between two points (13:10) Generating audio guidance for a custom map using Google Cloud Run and Text-to-Speech API (14:15) Using WaveNet to create natural-sounding, multi-language voices for text-to-speech scenarios (16:45) Building a digital assistant with Google Dialogflow Intent matching and other features Other use case examples of Google Dialogflow (21:30) Integrating voice while building applications on Flutter (22:35) Natural language alerts for warehouse operations (23:50) Big ideas: looking to the future of Fulfilld Other ResourcesWaveNet: A generative model for raw audio Google Cloud hands-on labs Google documentation: Creating voice audio files Build voice bots for mobile with Dialogflow and Flutter | Workshop The Definitive Guide to Conversational AI with Dialogflow and Google Cloud Find the rest of the series from Fulfilld below:
From chatbots to predictive text, all kinds of applications are using AI to navigate language barriers and facilitate communication across different communities. Many of these applications focus on text, but there is more to language than written words. Sometimes even fluent speakers of a second language will experience challenges when communicating face-to-face with native speakers. One of the best ways to overcome these challenges is to practice pronunciation.Markus Koy (@MarkusK) is an IT projects analyst with 18 years of experience across various industries. He is also a native German speaker living in an English-speaking part of Canada, and a regular visitor to C2C’s AI and ML coffee chats, which are hosted in the U.S. Koy’s experiences working in English-speaking countries as a non-native English speaker inspired him to create thefluent.me, an AI-powered app that tests speech samples and scores them based on how well they correspond to standard English pronunciation.On thefluent.me, users record themselves reading samples of English text (usually about 400 characters long), and then post them either publicly or privately on the app’s website. Within about 30 seconds, the app delivers results, reproducing the text and indicating which words were pronounced well and which can be pronounced better. Even native English speakers may find that they can improve their pronunciation, sometimes even more so than someone who speaks English as a second language.We recently approached Koy with some questions about thefluent.me, Google Cloud products, and his experience with the C2C Community. Here’s what we learned: What inspired you to develop thefluent.me? Koy began working on thefluent.me after contributing to a research project with an international language school. As a second-language English speaker himself, he had already taken the International English Language Testing System; he had found pronunciation to be the hardest part of the process.“Immediate feedback after reading a text is usually only available from a teacher and in a classroom setting,” he says. Teachers only listen to a speaker’s pronunciation once, and will likely not provide feedback on every word. Tracking progress systematically is just not feasible in a classroom setting, and sometimes non-native speakers will feel intimidated when speaking English in front of other students.Koy continued his research on AI speech-recognition programs and also graduated from Google’s TensorFlow in Practice and IBM’s Applied AI specialization programs. He decided to build thefluent.me to help students struggling to overcome these challenges. What makes thefluent.me unique? There are many apps on the market for students studying English as a second language, and thefluent.me is not the only app of this kind that uses AI for scoring. However, apps combine different features to support distinct learning needs. Koy kept these concerns in mind when designing and building the following features for thefluent.me: Immediate pronunciation feedback: The application delivers AI-powered scoring for the entire recording and word-level scoring on an easy-to-understand scale. Immediate feedback on reading speed: Besides pronunciation, the application provides feedback on the reading speed for each word. Own content: Users can add posts they would like to practice instead of using content only published by platforms. They can immediately listen to the AI read their post before practicing. Progress tracking and rewards: Users can track their activities and progress. They can revisit previous recordings and scores, check their average score, and earn badges. Group learning experience: By default, user posts are not accessible to others. However, users can also make their posts public and invite others to try, or they can compete for badges. How do you use the Google Cloud Platform? Do you have a favorite Google Cloud product? Koy runs thefluent.me on App Engine Flexible. He likes how easy the deployment process is, especially when managing traffic between different versions. Two key Application Programming Interfaces (APIs) Koy is using are Speech-to-Text and Text-to-Speech, which Koy says allow the Wavenet voices to sound more natural. He also likes that both allow him to choose different accents for the AI speech. Koy is also using Cloud SQL and Cloud Storage, which he finds easy to integrate. What do you plan to do next? “There are many other items for horizontal and vertical scaling on my roadmap,” Koy assures us. He is planning to add additional languages and enhance the app’s group features. He has also been approached by multiple companies who want to use thefluent.me for education and training. Koy plans to publish APIs to accommodate these requests in the coming weeks. Why did you choose to join the C2C community? Like so many of our members, Koy joined the C2C community to meet people and collaborate, but his experience here has informed his work on thefluent.me beyond friendly conversation. Recently, a community member expressed to Koy that thefluent.me is an ideal tool to use when preparing for a job interview—a user can rehearse answers to interview questions to learn to pronounce them better. For Koy, this is not just nice feedback; it is also a use case he can add to his roadmap.Still, community itself is enough of a reason for Koy to return on a weekly basis. “Mondays are just not the same anymore without our AI and ML coffee chats,” he says.
There are two ways to train a machine. The first is to train it to recognize objects by teaching it rules. The second is to train it to recognize objects by giving it examples. The first modality is called the rules-based approach. The second is called the machine learning (ML) system.Example:You want a machine to produce certain wheels, identifiable by their company logos, rims, spokes, center caps, sizes, and other qualities. You feed the system data and rules so that it will produce these wheels, and no others. The problem with this approach is that a rule-heavy system becomes too challenging for you to maintain and too frustrating for the computer to remember. If you use the ML system, you can feed the computer examples of wheels with the correct logo, patterned center cap, number of spokes, size, and etc. The computer learns to produce the desired wheels through trial and error. Rule-Based LearningRule-based systems have four basic components: Facts, or domain of knowledge An inference engine, which interprets the facts and takes appropriate actions through rules that include probabilistic, associative, or “If-Then” reasoning (“IF A happens THEN do B”). A temporary working memory for briefly “remembering” those rules. A user interface that allows developers to add, subtract, or change input and output signals. The number of rules depends on the number of actions you want the system to handle, so 20 actions would require manually writing and coding at least 20 rules; the system is locked into following these rules. Machine LearningML is modeled after human intelligence, with the assumption that machine systems can learn from experience and improve their performance accordingly. ML is achieved through: Supervised learning, whereby developers use labeled input and output data to train the system. Unsupervised learning, whereby systems draw their own conclusions from unlabeled data. Semi-supervised learning, which blends supervised and unsupervised learning. Reinforcement learning, whereby the system learns through trial and error. In short, ML gives systems the ability to forage outside the box, adapt their “thinking,” and expand their capabilities. When do you use ML? When do you use rules?Each situation is different. In short: rules and ML are both easily interpretable. ML gives more accurate results (since it requires a lot of data) and is easier to maintain than a rule-based system. Rule-based training is faster, easier, and cheaper to execute. Execution again depends on the model. If you have a system with a large number of actions, then you will want to use ML for faster, cheaper, and more effective results. TL;DR:Use a rules-based approach when: There is a small or fixed number of outcomes. For example, an “Add to Cart” button can either be clicked or not. There is a risk of false positives. Only rules, with their 100% certainty, can prevent these from occurring. Your employer/team has neither the knowledge nor the resources for ML. Use ML when: The system calls for a more unpredictable approach––the task is too complex or uncertain for rigid rules. Situations, data and events are changing faster than the ability to constantly write new rules. Linguistic nuances cannot be encapsulated by rigid rules. When you’re working on tasks that call for an understanding of language, you will want to use the adaptive capabilities of ML. Google Cloud helps you build, deploy, and scale ML models faster, with pre-trained and custom tooling within its unified AI platform. Extra Credit:Why Is Self-Supervised Learning the Future of Machine Learning?The Difference Between Virtual Machines (VMs) and HypervisorsWhat is Automated Machine Learning?
If you work with IT or cloud computing, you’re sooner rather than later apt to come upon the microservice/ service-oriented architecture (SOA) debate. Both approaches are alike in that both break large, complex operations into smaller, more flexible components. Both scales meet the speed and operational demands of the company’s escalating data and involve cloud or hybrid cloud environments for deployment. Hereon, opinions differ. Some developers say microservices are an improvement of SOA, while others say there are key differences.Most important: Microservices are used for applications, while SOA is geared towards enterprises. What is Microservices? Certain IT projects could be too complex or large to manage, test, and deploy, so software developers fissure them into single containerized applications. Each function has its responsibility and team of developers. This helps the company speed processes, cut costs, and redress problems of open enterprise areas without dismantling operations of the whole. It also makes functions more effective and fault-resilient, among other benefits. ExampleAmazon.com divides into standalone categories (shipping, selling, customer support, etc.), where diverse teams develop and troubleshoot their particular application. That’s in contrast to the traditional monolithic architecture, where each category would be indistinct from the entire enterprise. What is Service-Oriented Architecture (SOA)? Service-oriented architecture (SOA) is just that. The enterprise constructs its IT system to deliver service rather than pivot around technical or operational aspects. Each function contains its relevant code and data integrations for achieving a particular service in SOA software architecture. As a result, the whole system is interoperable to enhance efficiency, agility, and productivity. ExampleA single security service is split into diverse components for authentication, authorization, audit, policy, encryption, and so forth. Each is furnished with its code and focuses on its delimited responsibility. (Other functions could include checking a customer’s credit, logging on to a website, or processing an application). Differences Between SOA and Microservices? Some developers insist microservices are essentially an upgraded version of service-oriented architecture (SOA), while others find the two approaches complementary. Difference include: Microservices is leaner and more agile than SOA. Microservices is open source and more functional than SOA. Microservices are standalone and smaller than most specialized components in SOA systems. Microservices are granular and narrower in their communication than SOA. Microservices can be developed, deployed, and tested faster than functions in SOA. Their lifespan is shorter. In technical terms: Microservices uses lighter-weight protocols like HTTP REST, while SOA prefers SOAP. In microservices, each service is developed with its communication protocol, while in SOA, the middleware enterprise-service bus (ESB) is used. SOA needs governance, while microservices can do without. To bring it all together with a possible use case, consider this: Enterprise-oriented SOA uses a continual flow of information and computing signals, achieved by protocols like RESTful APIs. In application-scoped microservices, synchronous communication would only cause latencies and weaken its resilience. So microservices use asynchronous communication, such as the publish/subscribe (Pub/Sub) model that helps them gain agility. Bottom Line Both microservices and service-oriented architecture (SOA) can best be described as an army of small specialized services (soldiers) trying to conquer a massive problem together instead of one big fighter doing everything. Although some developers tag microservices as the lightweight version of SOA, the real difference is in SOA staking out the enterprise while microservices focus on applications. Either model helps managers save time and costs as it slices monolithic systems into components, making services easier to work with. Which is best for you? Both approaches speed up automation. Larger and more diverse enterprises could benefit from the broader and less granular SOA design. Smaller environments, including web and mobile applications, are easier to develop with microservice architecture. Extra Credit There’s a science, if not an art, to microservice/ SOA applications, which is why entire courses and books dedicate themselves to this topic. Here are Google Cloud’s best practices for microservice performance.
Automated Machine Learning (ML) automates the steps in your ML workflow, including preparing the data, training the model, evaluating the model, tuning parameters, and generating predictions. This makes your work easier, less onerous, less time-consuming, cheaper, and more accurate.Auto ML is an emerging trend in high tech, with some conspiracy theorists warning it will eliminate your tech job. No worries! Careers in data science are here to stay, and automation just gives you more opportunities!The Purpose of Automated Machine LearningAutomated ML automates every part of your ML pipeline, from data preparation to product deployment. Features include: Cleaning the data - includes removing duplicate or relevant information, dealing with missing values, fixing structural errors, and handling outliers. Feature engineering - injects the model with features that make it more likely to give you the predictive results you want. Model selection - chooses one of many candidate models for a predictive modeling problem. Hyperparameter tuning - selects the best parameters for the model’s architecture. Model deployment - integrates the model into the production environment and verifies that it produces desired results. Data PreparationAuto ML identifies your type of data––Boolean, discrete, continuous, or text. It also performs task detection. For example, it explores whether the data represented is binary; what is the classification? What about regression or clustering and ranking? Finally, Auto ML examines if your data is ready accordingly.Feature EngineeringOnce the data has been cleaned and is ready for training, data scientists have the tedious task of preparing a suitable predictive model. Auto ML does all that work for you in minutes. Feature selection - chooses the best set of features for your model to help it predict as required. Data preprocessing - converts the raw (original) data into a readable format. Feature extraction - retains only the critical features and data that your model needs to become useful, eliminating anything redundant or irrelevant. Skewed data detection eliminates or corrects skewed data (namely outliers that appear in the raw data, which will distort your data if you keep them). Missing values detection - fills in missing data (for example, if participants have omitted a survey question in the data fed to the model, the model inserts a 0). Model SelectionModel selection includes finding the best type of model to use and the specific structure most suitable for a given data test. This is followed by model evaluation, where automation helps you scrutinize the entire process, from validation procedures to error-rooting, analysis, and configuration. Hyperparameter TuningHyperparameters are your best guesses for approximate model parameters. Done manually, this can take a while, requiring familiarity with algorithms and their strengths and weaknesses. The work needs to be thorough and carefully designed. Unsurprisingly, there are few data scientists available for this critical step. Nevertheless, Auto ML does the task at a fraction of the cost and time and fewer errors! DeploymentAuto ML helps you deploy the model as a web service to predict new data without writing code. It also allows you to test its generated predictions and fine-tune results.Use CasesAuto ML is most commonly used for the following functions: Proof of concept - To help you decide whether the design is feasible. For example, whether to proceed with a specific software application. Baseline model - Using a good-enough model for decent results, for example, testing on a previous project to guide you in your task. Deploy to production - Auto ML is used as an end-to-end tool to expedite, improve and automate your labor. ToolsThe most popular Auto ML applications are: RapidMiner - Free student version available Dataiku - Free community version available DataRobot - Commercial H20 Driverless - Commercial. Google Cloud AutoMLGoogle Cloud AutoML has a range of services that include the following: AutoML Vision for object detection. Video intelligence API for classifying video segments and object tracking in videos. AutoML Natural Language and Auto ML Translation for translating textual data. AutoML Tables for prediction and classification from structured data, like databases or spreadsheets. Wrap-upAuto ML typically provides faster, more accurate outputs than hand-coded algorithms, saves companies money on training staff or hiring experts, and makes ML more accessible to novitiates or organizations that lack the funds to hire skilled data scientists. That said, Auto ML is here to improve your data efficiency, not replace it. So, although you no longer need to be involved in the step-by-step ML process, you will still want to evaluate and supervise the model. Let’s Connect!Leah Zitter, Ph.D., has a Masters in Philosophy, Epistemology, and Logic and a Ph.D. in Research Psychology.
Python is an excellent tool for application development. It offers a diverse field of use cases and capabilities, from machine learning to big data analysis. This versatility has allowed Python to carve a real niche for itself in the computing world. And now, as DevOps becomes more and more cloud-based, Python is also making its way into cloud computing as well.However, that’s not to say that running Python can’t come with its own set of challenges. For example, applications that perform even the simplest tasks need to run 24/7 for users to get the most out of their capabilities, but this can take up a lot of bandwidth—literally.Python can run numerous local and web applications, and it’s become one of the most common for scripting automation to synchronize and manipulate data in the cloud. DevOps, operations, and developers use Python as a preferred language, mainly for its many open-source libraries and add-ons. It’s also the second most common language used on GitHub repositories. Today we’re talking about running Python scripts on Google Cloud and deploying a basic Python application to Kubernetes.Requirements for Running Python Script on Google Cloud Before you can work with Python in Google Cloud, you need to set up your environment. After that, you can code for the cloud using your local device, but you must install the Python interpreter and the SDK. The complete list of requirements includes: Install the latest version of Python Use venv to isolate dependencies. Install your favorite Python editor. For example, PyCharm is very popular. Install the Google Cloud SDK. Install any third-party libraries that you prefer. What Runs Python on Google Cloud? Businesses all over the world can benefit from cloud hosting. Both cloud-native and hybrid structures have technological benefits like data warehouse modernization and levels of security compliance that help fortify the development process. But running code on Google Cloud requires a proper setup and a migration strategy, specifically a Kubernetes migration strategy, if you intend to orchestrate containerization. Generally speaking, however, any code deployed in Google Cloud is run by a virtual machine (VM). Kubernetes, Docker, and even Anthos make application modernization possible for large applications. In the case of smaller scripts and deployments, a customizable VM instance is adequate for running Python script on Google Cloud and determining processor size, the amount of RAM, and even the operating system of choice for running applications. Google Container Registry and Code Migration To begin scheduling Python scripts on Google Cloud, teams must first migrate their code to the VM instance. Many experts recommend doing so through Google Container Registry for storing Docker images and the Dockerfile.First, you must enable the Google Container Registry. The Container Registry requires billing set up on your project, which can be confirmed on your dashboard. Since you already have the Cloud SDK installed, use the following gcloud command to enable the registry:gcloud services enable containerregistry.googleapis.comIf you have images from third-party images, Google provides step-by-step instructions with a sample script that will migrate to the Registry. You can do this for any Docker image that you store on third-party services, but you may want to create new projects in Python that will be stored in the cloud. Creating a Python Container Image After you create a Python script, you can create an image for it. A Docker image is a text file that contains the commands to build, configure, and run the application. The following Docker example shows you the content of a Dockerfile used to build and image:# syntax=docker/dockerfile:1FROM python:3.8-slim-busterWORKDIR /appCOPY requirements.txt requirements.txtRUN pip3 install -r requirements.txtCOPY . .CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]After you create the image, you can now build it. Use the following command to build it:$ docker build --tag python-dockerThe --tag option tells Docker what to name the image. You can read more about creating and building Docker images here.After the image is created, you can move it to the cloud. You must have a project set up in your Google Cloud Platform dashboard and be authenticated before migrating the container. The following command migrates the image to Google Cloud Platform:gcloud build submitThe above basic commands will migrate a sample Python image, but full instructions can be found in the Google Cloud Platform documentation. Initiating the Docker Push to Create a Google Cloud Run Python Script Once the Dockerfile has been uploaded to the Google Container Registry and the Python image has been created, it’s time to initiate the Docker push command to finish the deployment and prepare the storage files. A Google Cloud run Python script requires creating two storage files before a developer can claim the Kubernetes cluster and deploy it to Kubernetes.The Google Cloud Run platform has an interface to deploy the script and run it in the cloud. Open with the Cloud Run interface, click “Create Service” from the menu and configure your service. Next, select the container pushed to the cloud platform and click “Create” when you finish the setup. Deploying the Application to Kubernetes The final step to schedule a Python script on Google Cloud is to create the service file and the deployment file. Kubernetes is common in automating Docker images and deploying them to the cloud. Orchestration tools use a language called YML to set up configurations and instructions that will be used to deploy and run the application. Once the appropriate files have been created, it’s time to use kubectl to initiate the last stage of the final stage to run Python on Google Cloud. Kubectl is a command-line tool that makes running commands against Kubernetes like deployments, inspections, and log visibility. It’s an integral step to ensure the Google Cloud run Python script runs efficiently in Kubernetes and the last leg of the migration process.To deploy a YML file to Kubernetes, run the following command:$ kubectl create -f example.ymlYou can verify that your files deployed by running the following command:$ kubectl get servicesExtra Credit The Easiest Way to Run Python In Google Cloud (Illustrated) Running a Python Application on Kubernetes- Medium article Running a Python application on Kubernetes - Open Source article Google Cloud Run – Working with Python
How do we get non-humans to talk to us, translate text from one language to another, read and understand our documents, summarize large volumes of text rapidly, and give us answers - all in real-time? Because that's exactly what machines called Alexa or Siri does, or the conversational AI on Capital One that tells me the answer to my question (and often gets it wrong), or Google search engines and the like that not only use autocorrect to assist me with my queries, but also spit up responses that answer them. In the same category, there are AI Translators like Google Translate that instantly translates text from one language to another (I just hover my phone over the word and Google does the rest!), and plagiarism checkers like Grammarly for the editors of C2C to check whether this article’s plagiarized. (No fear!)It’s not much different from teaching children or ESL students to read and speak English, or any language for that matter.We do it through natural language processing, called NLP.C2C Event Alert: Interested in NLP? Keep up with the FullFilld story: Journey to Deployment and hear how they’re using NLP to build the their product. You’ll connect directly with the CTO,@mpytel, @YoshEisbart and development teams and can share your own expertise, provide feedback and learn how they’re overcoming similar challenges. What’s Natural Language Processing? Natural language processing (NLP) was created in the 1950s through Alan Turing who sought to determine whether a computer could mimic human responses. NLP is a two-step process. Scientists strip the training data to its rudiments for machines to work with. This is called “Data preprocessing”. Scientists then use one or other machine learning techniques to train the algorithm to understand and respond as required. Here’s how it works. Phase 1: Data Preprocessing.Computer scientists break the text to its basics through the following steps: Segmentation The text is broken down into its smallest constituent units.Example:The sentence “Digital assistants are mostly female because studies show you’re more attracted to a woman's voice” gets broken into: “Digital assistants are mostly female” “Studies show you’re more attracted to a woman's voice” 2. TokenizingWe need the algorithm to understand the constituent words, so we “tokenize” those words.Example:“Digital assistants are mostly female”We isolate each word: “Digital”. “Assistants”. “Are”. “Mostly”. “Female”. 3. Stop WordsWe eliminate inessential words that are only there to make a sentence more cohesive. Common examples are “and”, “the”, “are”.Example:In “Digital assistants are mostly female”, it would be “Are”. “Mostly”.Leaving us with:“Digital”. “Assistants”.“Female”. 4. StemmingNow that we’ve broken down the document and hacked it to its essentials, we need it to explain its meaning to our machine. We do that by pointing out that some words such as Skip+ing, Skip+s, and Skip+ed are the same word with added prefixes and suffixes. 5. LemmatizationWe also consider the context and convert the sentence to its base form in terms of mood, gender etc. (This is called “lemma”, or “state of being”). Common examples are “Am”. “Are”. “Is”.Example:In ““Digital assistants are mostly female”, we tag the word “are” as Present Plural. 6. Speech TaggingHere’s where we explain the concept of nouns, verbs, adjectives, adverbs and the like to the machine by adding those tabs to our words.Example:“Studies (noun) show (verb) you’re (pronoun) more attracted (adverb) to a (preposition) woman's (noun) voice (noun)” 7. Named Entity TaggingWe introduce our machine to pop culture references and everyday names by flagging names of movies, important personalities or locations, and so forth that may appear in the document. Phase 2: Algorithm DevelopmentComputer scientists use different natural language processing methods to train the model to understand and respond accordingly. The two most common methods are: Machine learning algorithms like Naive Bayes to teach our models’ human sentiment and speech. Rules-based systems, namely human-made rules that scientists use to program algorithms. Example: Robots in Saudi Arabia get passports. IF AI Sophia lives in the Emirates. THEN she gets guaranteed nationality. What is NLP Used For?Natural language processing (NLP) is used for a variety of functions that include: Text classification, where you teach the algorithm to recognize and categorize text. Example: Gmail with its Gmail Spam Classifier that filters spam email. Text extraction, where an algorithm is fed a quantity of material and asked to rapidly summarize it. Example: Google Scholar that summarizes quantities of academic research material. Machine Translation, where the algorithm is trained to translate spoken or written words from one language to another. Natural language generation, where an AI cobbles sense from random items. Example: automated journalism, where an engine scrapes the web for news and returns a summary in seconds. Two Open Problems in NLPAs evolved as the field’s become, robots are still challenged in certain areas. These include: Context. Even the most sophisticated machines are challenged by ambiguous words. Example: You could tell an AI to meet you at the “bank” and they can go to the stream or to Wells Fargo. Likewise, you may tell the machine ‘You’re great!’, and the robot exclaims Thank you! When really you're frustrated - ‘You’re (grunt) great.’ The evolving use of language. The model needs to be dismantled to acquire updated language and trending expressions. Named Entity Recognition (NER). Recognizing names of “big shots” or famous companies is insufficient. Algorithms need to recognize items such as person names, organizations, locations, medical codes, quantities, monetary values, and so forth. Sophisticated vocabulary. To be super-helpful, NLP needs to acquire a broad and nuanced vocabulary, For most NLP software applications that are (at the moment) beyond their reach. Bottom Line The wonder of natural language processing (NLP) is that these non-human machines are more intelligent and articulate than a regular random sampling of our human population. Their knowledge is immense, their linguistic skills incredible (the most sophisticated have mastered more than 100 languages) and their responses are mostly spot-on. They lack context, emotions, slang, and the like. That’s our instructional challenge, where Google Natural Language API is said to excel with that. On the other hand, some AI researchers believe they may never acquire this human-level cognition. They’re machines, after all. Let’s Connect!Leah Zitter, Ph.D., has a Masters in Philosophy, Epistemology and Logic and a Ph.D. in Research Psychology.Extra Credit
All of us regular people are awash in a world of the Internet of Things (IoT). That’s where we, as consumers, use WiFi-connected devices to control the world around us. The Industrial Internet of Things (IIoT), on the other hand, works through smart sensors rather than devices and refers to industries: health care, retail, agriculture, government, and so forth. The ramifications are significant and have more diverse applications with a world-changing impact. Internet of Things (IoT)In the broadest sense, the term IoT encompasses all the regular “dumb” things connected to the Internet, like smart toasters, attached rectal thermometers, and fitness collars for dogs. You use your internet-connected device, usually a smartphone, to “tell” the physically connected object how to act. For example, the device prompts the connected object to react when, where, and how you want it. It also feeds you real-time information on its physically connected object. Review some examples:Wearable devices and fitness trackers (e.g., Jawbone Up, Fitbit, Pebble). You program these accessories through the internet; they monitor your health. Home Automation (e.g., Nest, 4Control, Lifx). These internet-controlled applications monitor and control home features such as lighting, climate, entertainment systems, security, and appliances. Industrial asset monitoring (e.g., GE, AGT Intl.) is an internet-connected solution that remotely monitors and tracks your assets and facilities. Industrial Internet of Things (IIoT)Here’s where the world outside our doors uses the digitally connected world to feed it automatic and real-time reports on the safety, productivity, and economics of industries and their workers. Unlike IoT, communication comes to us through inherently programmed sensors rather than directly through our devices.Industry stakeholders use these smart sensors to receive immediate information on their assets that help them monitor, collect, exchange and analyze incoming data. In addition, entire cities operate off these sensors; they’re called smart cities. In effect, the whole developed world is one substantial Industrial internet of things since we’re all connected and interconnected through these sensors. Review some examples:Energy: water and sewage utility services rely on distributed but connected self-service water kiosks to gather real-time data on water quality. Health care: hospitals and healthcare institutions use networks of intelligent electronic devices to monitor patients' health status 24/7. The automotive industry: smart cars use sensors to “feel out” their environment and predict danger. Technologies That Fuel (I)IoTIoT and IIoT work through the following technologies:AI and ML that train these devices to respond as they do Cybersecurity for insulating their systems from attackers Cloud computing for storing their functionalities and data in cloud storage for scalability and security Edge computing brings their data storage closer to the actual location for faster response time Data mining that collates information on their experiences to prevent problems and improve their operations Pros and Cons of (I)IoTIt would be a sorry world without (I)IoT. Babies would be left crying; pets would be lost, thieves could more easily break into homes, more older people would die from falls, and so forth. That's as regards IoT. Now with IIoT, just think how many lives have been saved through heart and EKG monitors—products of IIoT. There’s also Amazon’s same-day shipping that’s achieved through IoT-programmed robots stocking shelves and loading trucks.On the other hand, IIoT can be extremely dangerous. All it needs is one malicious actor to crack one single endpoint of the system to place hundreds of thousands of lives at risk—or even to stall an entire country. Review some examples:Hijacking vehicles. Modern vehicles have an OBD II device that’s connected to the internet. So it’s difficult but not impossible for intelligent hackers to remote-hack these vehicles that include ambulances and planes and terrorize a nation.“Now I am become Death, the destroyer of worlds” - but also the creator of worlds aptly describes the ramifications of IIoT. Consequential! Related ConceptsOther terms that are slightly similar to (I)IoT are:M2M (machine to machine) communication, primarily used in the telecoms sector to refer to IP-transmitted data Web of Things that more narrowly relates to software architecture Industry 4.0. to name our ongoing revolutionary era of smart manufacturing and industrial automation Smart systems or Intelligent systems which use AI- and ML-trained innovations that help us manage and predict Pervasive computing for embedding computing into everyday objects that transforms them into intelligent thingsBottom lineThe rock-bottom difference between IoT and IIoT is that IoT is B2C (business-to-consumer), while IIoT is B2B (business-to-business). The first is user-centered, while the second deals with groups, communities, cities of people. As such, the second is more consequential than the first. Nevertheless, both categories provide valuable connectivity, efficiency, scalability, time savings, and cost savings for individuals and groups/ industries alike.When it comes to Google Cloud, its robust architecture provides IoT and IIoT operators with the tools they need to build the future. Let’s Connect!Leah Zitter, PhD, has a Masters in Philosophy, Epistemology and Logic and a PhD in Research Psychology.
Technology exists to advance and improve, but even the most cutting-edge developments in technology require us to ask questions that have been with us forever. Questions of ethics are never settled, and as the advancement and implementation of artificial intelligence become more and more rapid, these questions are as important as ever. C2C recently brought these questions to a discussion with Tobi Wole, a Berlin-based data analytics engineer who gave a presentation mapping principles essential to AI Ethics against the 10 Commandments. Below is a recording of the full discussion:And here’s Tobi’s presentation, “10 Commandments and AI Ethics” Rules to Live By Wole found some striking points of contact between the 10 Commandments––a foundational text of the ethics we live by today––and core ethical principles of AI. Commandment 5, “Honor your father and mother,” speaks to the need for human authority over AI technologies; for example, the University of Bologna’s “ethical knob,” which allows a human driver to take control of a self-driving car if necessary: Commandment Seven, “You shall not commit adultery,” offers a funny segue into some concerns related to privacy, particularly security and access control: Most relevant of all is Commandment nine, “You shall not bear false witness against your neighbour [sic.],” which translates directly to present considerations regarding authenticity and honesty in digital technology. Watch the clip below for Wole’s commentary and a look at the ultimate false witness: an AI deepfake of Barack Obama: Honoring Diversity of Thought and Opinion Wole’s presentation offered numerous examples of other ethical principles crucial to proper AI development during an open discussion with C2C team members and colleagues. When C2C’s Sabina Bhasin raised the question of “diversity of thought and opinion,” Jeff Branham brought up some issues he’s faced in his work with machine learning models. AIs collect and analyze data. How do we make sure these AIs use this data to provide customers with the insights they want? Guaranteeing that ethics is central to AI developers’ decision-making process is not just a formality; it’s for our common good. Automation and Regrowth Tobi’s knack for comparison reaches beyond the 10 Commandments. When Danny Pancratz, C2C’s Director of Product, raised some concerns about automation and the necessity of higher-level work for employees whose jobs AI technology might replace, Wole likened the problem to cutting down a tree. If you’re going to cut something down––whether it’s a tree, or someone’s job––make sure you are preparing for more to grow back in its place. Who is Responsible for AI Ethics? Ethical questions tend to return us to our most basic values and beliefs, and in a way, the conversation ended where it began. Tobi interpreted the first Commandment, “You shall have no other gods before me,” as a principle of accountability. Branham asked the final question and wanted to know where the discussion of ethics should live on the teams that model AI. Oluwole mentioned developers and product teams, as well as management and executive management, but ultimately offered one clear answer: everyone who works in AI should be thinking about questions of ethics.
This C2C Deep Dive was led by Sindhu Adini (@Sindhu Adini) , director of Google Cloud for HLS at SpringML, a C2C foundational platinum partner. Joining Sindhu from the Peerlogic team were CEO Ryan Miller (@ramill401) and Alex Maskovyak, engineering and product development executive.Peerlogic is an innovative provider of cloud communications, building better conversations and high production through the power of AI. Their products allow individuals to work more productively, teams to collaborate more freely, and organizations to better understand their data.The full recording from this session includes:(2:05) Speaker introductions (4:20) SpringML’s specializations and industry reach (6:50) An introduction to Peerlogic and how they are empowering dental practices with improved communications between staff and patients (12:05) Analyzing patient sentiment with AI and ML Adopting call center best practices and front desk assistance Identifying revenue leakage Benchmarking and understanding conversion opportunities (17:10) Overview of Peerlogic’s application (21:15) Google Cloud services and components used Choosing Google Cloud Spectrum of AI in the Google Cloud ecosystem Google Cloud Vertex AI for pre-trained APIs and end-to-end integration for data and AI (31:25) Architectural overview of the solution and model Data pipeline to ingest audio scripts and Google Cloud Speech-to-Text Enhanced augmentation of the solution using custom ML algorithms FireStore to authenticate AppEngine access only to Service Accounts (38:50) Key considerations for Machine Learning Identifying the business problem that needs to be solved How predictions are made Supervised learning (44:55) Custom patient call analysis modelWatch the full recording below: Connect with SpringML here in the C2C Community.
The most popular machine learning (ML) models are supervised learning, unsupervised learning, and reinforcement learning. Each of these models has its limitations. In 2017, Yann Lecun developed self-supervised learning, where machines teach themselves with none of the expense and all the sophisticated results of machines fed on supervised learning. Curious about how this unlocks more efficiency? Read on. Problems with ML Models Each of our common ML models has its problems. With supervised learning, you need a huge amount of data—something that's expensive and time-consuming. Models fed on the unlabeled data of unsupervised learning produce only relatively simple tasks like clustering and grouping. With environment-based reinforcement learning, algorithms are biased by their environment, typically producing distorted results. Along comes self-supervised learning, an approach where machines supposedly teach themselves. What Is Self-Supervised Learning? Developed by computer scientist Yann Lecun in 2017, self-supervised learning has become the hottest thing in AI. Essentially, Yann suggests that machines could learn just as children do, namely with a synthesis of supervised and unsupervised learning that they absorb in their environment. Children cognitively develop through exposure to the equivalent of supervised and unsupervised learning. It’s supervised learning in that teachers train them on batches of labeled data. They’re shown images and taught, for example, This man was George Washington. At the same time, they instinctively learn to reason, identify patterns, and predict as an innate function of their minds. Life throws them “unlabeled data,” and they automatically make conclusions. That’s where unsupervised learning comes into play.For machines to do the same, Yann turned to a natural language processing (NLP) tool called “transformers.” TransformersTransformers use NLP principles to “transform” a simple image or caption into a font of insights, by probing part of a data example to figure out the remaining part. That data can be text, images, videos, audio, or anything.Transformers examine the data through running on “encoders” and “decoders” that dissect the image into various outputs.The encoder maps the given data onto a certain n-dimensional vector. That abstract vector is fed into the decoder, which spits out a sequence of language, symbols, a copy of the vector, and so forth—whichever results the operator wants. The end results are the same as supervised models, without the extensive and expensive batches of data. Self-supervised models also produce more sophisticated insights than unsupervised models. Self-Supervised Learning: UsesSelf-supervised learning mostly focuses on computer vision and NLP capabilities. Operators use it for tasks that include the following: Colorization for coloring grayscale images Context filling, where the technology fills a space in an image or predicts a gap in a voice recording or text Video motion prediction, where SSL provides a distribution of all possible video frames after a specific frame Examples of SSL in Everyday Life Health Care and MedicineSSL improves robotic surgery and monocular endoscopy by estimating dense depths in the human body and the brain. It also enhances medical visuals with improved computer vision technologies such as colorization and context filling. Autonomous DrivingSelf-supervised AI helps autonomous vehicles “feel” the roughness of the terrain when off-roading. It also provides depth estimation, helping them identify the distance to other vehicles, people, or objects while driving. ChatbotsSSL implant chatbots with mathematical symbols and language representations. As sophisticated as chatbots are, they lack the human ability to contextualize. Final Thought Self-supervised learning is the hottest item in AI, but it still grapples with an intuitive grasp of the visual, oral, or thematic context. SSL-fed chatbots, for example, still have a hard time understanding humans and fielding a relevant conversation. That said, self-supervised learning has contributed to innovations across fields to the extent that popularizers call it the next step towards robot perfection. Let’s Connect! Leah Zitter, Ph.D., has a masters in philosophy, epistemology, and logic and a Ph.D. in research psychology.
Photo by Ivan Aleksic on Unsplash Hello, developers! If you have worked on building deep neural networks, you might know that building neural nets can involve performing a lot of experimentation. In this article, I will share some tips and guidelines that I feel are pretty useful and can use to build better deep learning models, making it a lot more efficient for you to stumble upon a good network.Also, you may need to choose which of these tips might be helpful in your scenario; everything mentioned in this article could straight up improve your models’ performance. A High-Level Approach for Hyperparameter Tuning One of the painful things about training deep neural networks is the many hyperparameters you have to deal with constantly. These could be your learning rate α, the discounting factor ρ, and epsilon ε if you are using the RMSprop optimizer (Hinton et al.) or the exponential decay rates β₁ and β₂ if you are using the Adam optimizer (Kingma et al.). You also need to choose the number of layers in the network or the number of hidden units for the layers; you might be using learning rate schedulers and want to configure that and a lot more! We need ways to organize our hyperparameter tuning process better.A common algorithm I usually tend to use to organize my hyperparameter search process is the random search. Though there exist improvements to this algorithm, I typically end up using random search. Let’s say, for this example, you want to tune two hyperparameters, and you suspect that the optimal values for both would be somewhere between one and five. The idea here is to instead of picking 25 values to try out [like (1, 1) (1, 2), etc.] systematically, it would be more effective to select 25 points at random. Based on Lecture Notes of Andrew Ng Here is a simple example with TensorFlow where I try to use random search on the Fashion-MNIST dataset for the learning rate and the number of units:Radom Search in TensorFlow.I would not be talking about the intuition behind doing so in this article. However, you could read about it in this article I wrote some time back. Use Mixed Precision Training for Large Networks Growing the size of the neural network usually results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increase. While using mixed-precision training, according to Paulius Micikevicius and colleagues, the idea is to train deep neural networks using half-precision floating-point numbers to train large neural faster with no or negligible decrease in the performance of the networks. However, I would like to point out that this technique should be used only for large models with more than 100 million parameters.While mixed-precision would run on most hardware, it will only speed up models on recent NVIDIA GPUs, for example, Tesla V100 and Tesla T4 and Cloud TPUs. To give you an idea of the performance gains with using mixed precision when I trained a ResNet model on my Google Cloud Platform Notebook instance (consisting of a Tesla V100) I saw almost 3 times in the training time and almost 1.5 times on a Cloud TPU instance with near to no difference inaccuracies. To further increase your training throughput, you could also consider using a larger batch size (since we are using float16 tensors, you should not run out of memory).It is also rather easy to implement mixed precision with TensorFlow, the code to measure the above speed-ups was taken from this example. If you are looking for more inspiration to use mixed-precision training, here is an image demonstrating speedup for multiple models by Google Cloud on a TPU:Speedups on a Cloud TPU Use Grad Check for Backpropagation In multiple scenarios, I have had to custom implement a neural network, and usually implementing the backpropagation is the aspect prone to mistakes and is also difficult to debug. It could also occur that with an incorrect backpropagation, your model learns something that might look reasonable, thus making it even more difficult to debug. So, how cool would it be if we could implement something that could allow us to debug our neural nets easily?I often consider using gradient checks when implementing backpropagation to help me debug it. The idea here is to approximate the gradients using a numerical approach. If it is close to the calculated gradients by the backpropagation algorithm, then you could be more confident that the backpropagation was implemented correctly.As of now, you could consider using this expression in standard terms to get a vector, which we will call dθ[approx]Calculate approx gradientsIn case you are looking for the intuition behind this, you could find more about it in this article by me. So, now we have two vectors dθ[approx] and dθ (calculated by backprop). And these should be almost equal to each other. You could simply compute the Euclidean distance between these two vectors and use this reference table to help you debug your nets:Reference Table Cache Datasets Caching datasets is a simple idea but one I have not seen much to be used. The idea here is to go over the dataset in its entirety and cache the dataset either in a file or in memory (if it is a small dataset). This should save you from performing some expensive CPU operations like file opening and data reading from being executed during every single epoch. Well, this also means that your first epoch would comparatively take more time since you would ideally be performing all operations like opening files and reading data in the first epoch, and then you would cache them. The subsequent epochs should be a lot faster since you would be using the cached data in the subsequent epochs.This particularly seems like a very simple to implement the idea, indeed! Here is an example with TensorFlow showing how one can very easily cache datasets and also shows the speedup with implementing this idea: Example of Caching Datasets and the Speedup with It Common Approaches to Tackle Overfitting If you have worked on building neural networks, it is arguable overfitting or underfitting might be one of the most common problems you face. This section talks about some common approaches that I usually use when tackling these problems. You probably know this, but high bias will cause us to miss a relation between features and labels (underfitting), and high variance would cause capturing the noise and overfitting to the training data.I believe the most promising way to solve overfitting is to get more data, though you could also augment data. A benefit of deep neural networks is that their performance improves as they are fed more and more data. A benefit of very deep neural networks is that their performance continues to improve as they are fed larger and larger datasets. However, in a lot of situations, it might be too expensive to get more data (or simply infeasible to do), so let’s talk about a couple of methods you could use to tackle overfitting.Ideally, this is possible to do in two manners: either changing the architecture of the network or by applying some modifications to the network’s weights. A simple manner to change the architecture such that it doesn’t overfit would be to use random search to stumble upon a good architecture or try pruning nodes from your model. We already talked about random search, but in case you want to see an example of pruning, you could take a look at the TensorFlow Model Optimization Pruning Guide.Some common regularization methods I tend to try out are: Dropout: Randomly remove x% of input. L2 Regularization: Force weights to be small reducing the possibility of overfitting. Early Stopping: Stop the model training when performance on the validation set starts to degrade. Thank You Thank you for sticking together with me till the end. I hope you will benefit from this article and incorporate these in your own experiments. I am excited to see if it helps in improving the performance of your neural nets, too. If you have any feedback or suggestions for me, please feel free to send them over! @rishit_dagli
Energy disruption comes through two planks. First, people using more renewables (and less dirty energy), which, in turn, comes through lowering the costs of renewables and decreasing the use of dirty energy. Second, energy disruption—and a cleaner world—comes through collating the millions of sensors and Internet of Things (IoT) devices installed each year and managing them on one pane of glass.AI achieves both objectives through its capabilities that allow consumers, retailers, and businesses to predict, analyze, and classify energy data to control them. AIOps helps engineers collate all energy-released data on one pane of glass and monitor, predict, and implement energy controls in real-time through AI functionalities. AI Enables Adaptive Controls AI makes our world more energy efficient through three of its capabilities: prediction, classification, and insight generation.Prediction problems include questions like: Can I predict whether this will or will not occur, given the available data? In the retail and commercial space, fault prediction and dynamic maintenance are among the most straightforward uses of AI and helps operators predict equipment failures. It does this by using sensor data from various units and significantly reduces downtime and maintenance costs. For example, DeepMind, a subset of Google, uses reinforcement learning to reduce energy use in its data centers by 15%. In Cambridge, Origami Energy uses ML to predict asset availability and market prices in near real-time. Then, there is insight generation, where I can get information from the available big data collated by AI. In a commercial and retail setting, AI models learn from individual data and then issue changes to individual units. For example, U.K.-based electricity and gas utility company National Grid uses AI learning tools to better forecast demand to the system, resolving to reduce Britain’s energy usage by 10%.As a practical example, I’ve seen that this massive surge in energy comes from using my washing machine on certain days. I use this data to problem-solve ways to slash the energy increase. I group incoming data from my energy devices into categories, enabling me to deduce where I can reduce energy usage. For Devs and Architects For IT developers and AI engineers, AI achieves efficient energy management by helping engineers collate and manage all AI-driven assets on one piece of glass. These internet-connected devices include generation assets, smart buildings, IoT and smart meters, and EV and mobility. This smart microgrid helps engineers do the following: Monitor all energy assets—for example, solar, EV, smart buildings, or distributed energy—regardless of form. Control and optimize energy assets in real-time, trying to regulate ecosystems at the lowest possible cost. This is done through ML, gleaning insights from the mass of accumulated data on energy generation and consumption usage (for example, if the solar system will generate solar energy in the next 15 minutes). Engineers manage these assets based on the ML algorithm. Collect insight from aggregated big data—for example, from renewable energy assets and electricity tariffs. Analyze and predict how the energy assets perform, adjusting algorithms based on generation and consumption patterns, like providing scheduling information for energy providers. Through these controls, engineers can forecast and prevent potential problems. Looking Forward Our world would be a less healthy, more energy-dense place if it weren’t for AI. From platforms like your Google Fuchsia OS, you can supervise, regulate, and control your energy assets to cut costs, predict how they will work in the future, and introduce innovation, among other items. For customers, business owners, and developers, AI helps them see in real-time how their energy assets perform and decide how to recalibrate to cut energy usage, use more renewable energy, or more wisely manage their energy devices. Let’s Connect! Leah Zitter, Ph.D., has a master’s in philosophy, epistemology, and logic and a Ph.D. in research psychology. Extra Credit This piece kicks off our week all around Earth Week, thanks for reading. We have a lot more for you through the week, check it out here. Don’t forget to sign up and join us on Friday for a community discussion around all topics related to Earth Week!
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.OK
Sorry, our virus scanner detected that this file isn't safe to download.OK