Using knowledge graphs to replace analysts

Recently CB Insights published a FinTech-report titled The State Of Fintech Q3’20 Report: Investment & Sector Trends To Watch This post shows how a knowledge graph can be used to automatically generate such a report. About knowledge graphs A knowledge graph consists of two parts: a node and a relationship. A node represents an entity (a company, an industry, etc.). A relationship represents how two nodes are connected. In this case, the knowledge graph’s nodes are: companies the offerings that these companies launched the industries in which these companies are the companies that invested in these companies The relationships are: company “IS IN” industry company “INVESTED IN” company company “launched” offering The graph below shows an excerpt of the whole graph: For instance, you see three companies that invested in Revolut that Revolut is in the FinTech-industry that Revolut launched commission-free stock trading With this in mind, the remaining post shows how graph algorithms for anomaly detection can generate the above-mentioned report. Anomaly detection Anomaly detection is used to find rare nodes or relationships which are significantly different from the remaining data. Such nodes or relationships signalize unoccupied industries or segments for innovations or new markets. Consider this insight from

Going beyond the article headline: what knowledge graphs reveal

I recently read this article: Qlik übernimmt Blendr.io. From the article it was clear what happened: Qlik a Business Intelligence-company acquired Blendr.io, a Data Integration-company. However, I was wondering: What am I missing? What information is not evident from the article alone. To answer this question I built a knowledge graph. The knowledge graph consists of companies, industries, and company strategies that are related to Qlik and Blendr.io. An overview of the knowledge graph is shown in the image below: The meaning of the colors are as follow: orange: represents a company blue: represents an industry red: represents a company strategy There a several reasons why such a knowledge graph can answer the question ‘What am I missing?’. They are explained below. There is more to it than just ‘Qlik acquired Blendr.io’ The image below shows two graphs: in the upper left corner you see the basic graph that represents ‘Qlik acquired Blendr.io’ in the remaining image you see the full picture with all the relationships that Qlik and Blendr.io have It is immediately clear that ‘Qlik acquired Blendr.io’ is too simplistic. If you look at the whole graph you immediately notice: industry outsiders like Google or PwC are active

Understanding company decisions using knowledge graphs

Recently I read this article (Cloudera adds data engineering, visualization to its Data Platform) and wondered why it was relevant. To an industry-expert the implications are clear, but to me – an industry-novice – not. This situation made me wonder how a knowledge graph can be used to automatically answer the “why” behind a company’s decision. Based on an innovation process in patent analysis, I came up with this process: I manually identified a similar article (Tech and Antitrust Follow-up, Google Buys Looker, Salesforce Buys Tableau; paywall) From this article, I constructed the knowledge graph below. I defined explicit relationships (as mentioned in the article) and implicit relationships (in green; these relationships can be inferred from an external knowledge source like Wikipedia) The knowledge graph depicts the following points: Google’s biggest competitors (Microsoft and Amazon) own tools in the data analytics and visualization segment (Looker’s segment) the data analytics and visualization segment is important to Google because that segment is part of the “Big Data movement” (implicit relationship) and “Big Data” is important to Google (implicit relationship) Google does not own any tools in the analytics segment Based on this information, it makes sense that Google acquired a tool in

Is ai to Tableau what vlookup is to excel?

As Ben Thompson from Stratechery wrote on Google’s acquistion of Looker “data analytics and visualization is a large and growing segment in enterprise software”. As Boris Evelson from Forrester points out, BI tools have reached technological maturity in certain areas such as d”atabase connectivity and data ingestion, security, data visualization, and slice-and-dice OLAP capabilities”. At the same time he points out the lack of demand: Fifty-six percent of global data and analytics decision makers (seniority level of manager or above) say their firms are currently in the beginner stage of their insights-driven transformation. Further anecdotal evidence shows that enterprises use no more than 20% of their data for insights, and less than 20% of knowledge workers use enterprise BI applications, still preferring spreadsheets and other shadow IT approaches. The reasons are – as he points out – “the low maturity of the people/process/data”. BI-vendors are trying to solve this issue by extending their solutions into E2E-tools; Considers Pentaho’s integration with Lumada: Lumada’s focus is on covering the entire data lifecycle, from the integration of various data sources to the evaluation of video and IoT data in compliance with DSGVO regulations and their deployment in self-service applications. Pentaho’s plans for its

Data loading processing in the data warehouse to handle deletes

When you are populating your data vault, you might need to delete you stage-tables in an asynchronous way; load -> staging -> integration layer Only – and only – when you have populated the integration layer, you can delete the entries from your load table. One way to achieve this is to implement a delete-tracking-table that will track your deletes. The process is like this: Set up a metadata-table that contains: your target table and its source table After populating a table in the integration layer, you store this information in the delete-tracking-table. Concretely you track: the source table, the table in the integration layer, and the highest load date in your table in the integration layer Initiate the delete-process: for each source table defined in your metadata-table get the lowest load date from the delete-tracking-table. If there is no entry in your delete-tracking-table, use 1753 as a default. Delete every entry from your source table that is lower than this load date.

How Coinbase is building a crypto empire for users’ crypto lifecycles

Recently, Coinbase acquired task-platform Earn.com. Coinbase is an online platform for users and merchants to buy, sell, and accept cryptocurrencies. For these activities Coinbase has three different products: gdax exchange: buying and selling of cryptocurrencies for institutional and professional investors Coinbase.com: buying and selling of cryptocurrencies for „mainstream“ users Coinbase Commerce: merchants payment systems for accepting cryptocurrency payments Earn.com is a task-platform where users earn bitcoin for completing tasks. The tasks are offered by blockchain startups doing an ICO and involve things like signing up for newsletters or joining telegram groups. Those blockchain startups are very often in an early phase and Earn.com serves as a marketing tool for them. Some argue that the acquisition was an acqui-hire for Earn founder and CEO Balaji Srinivasan. And Balaji Srinivasan, who has an impressive track record (among other things as partner at Andreessen Horowitz) is now indeed Coinbase’s CTO. Whereas acqui-hiring Balaji Srinivasan might be the acquisition’s real intention, looking at the acquisition in the context of Coinbase’s other acquisitions and their self-imposed company description shows another perspective, namely that Coinbase is building a “crypto empire“ serving a user’s whole „crypto lifecycle“. Building a crypto empire with Earn.com, Cipher Browser, Coinbase.com, and

Thoughts on Stellar's Randos Per Week in the context of increased crypto awareness

In their “Stellar 2018 Roadmap” (see Thoughts on “Stellar 2018 Roadmap”) Stellar jokingly (at least I hope so) shared the critical indicator for a decentralized protocol” (original emphasis), namely randos per week (r.p.w or rpw; number of random people talking about crypto) and promised equally moonish growth. Although meant as a joke there is some truth in those numbers. The number of average — „non-crypto“ — people talking about it has — at least in my perception — increased in the last couple of weeks and months. More importantly, such popularity metrics are important for the diffusion of cryptoassets; the more people know about it, the greater the likelihood of acceptance. Nevertheless, the recently increased popularity of crypto is not without its caveats. Prevalence of common misconceptions hindering diffusion: Firstly, a lot of the attention is still on getting rich, Crypto being a bubble and people confusing all alts with Bitcoin. As long as these misconceptions prevail, crypto won’t reach mass market adoption. Creation of overhyped interest leading to bursting bubble: On the one side Blockchain and Co. are overhyped to be the next big thing and if possible right now. On the other side, adoption is either low or not perceived because it is happening under the hood. For instance, Stellar’s partnership with Tempo

Thoughts on Stellar’s Randos Per Week in the context of increased crypto awareness

In their “Stellar 2018 Roadmap” (see Thoughts on “Stellar 2018 Roadmap”) Stellar jokingly (at least I hope so) shared the critical indicator for a decentralized protocol” (original emphasis), namely randos per week (r.p.w or rpw; number of random people talking about crypto) and promised equally moonish growth. Although meant as a joke there is some truth in those numbers. The number of average — „non-crypto“ — people talking about it has — at least in my perception — increased in the last couple of weeks and months. More importantly, such popularity metrics are important for the diffusion of cryptoassets; the more people know about it, the greater the likelihood of acceptance. Nevertheless, the recently increased popularity of crypto is not without its caveats. Prevalence of common misconceptions hindering diffusion: Firstly, a lot of the attention is still on getting rich, Crypto being a bubble and people confusing all alts with Bitcoin. As long as these misconceptions prevail, crypto won’t reach mass market adoption. Creation of overhyped interest leading to bursting bubble: On the one side Blockchain and Co. are overhyped to be the next big thing and if possible right now. On the other side, adoption is either low or not perceived because it is happening under the hood. For instance, Stellar’s partnership with Tempo

Avoiding, Reporting, and Shilling: Three strategies towards crypto partnership reporting

It seems to me that most crypto projects follow one of three strategies in their reporting of relationships: Avoiding: Being quiet about partnerships Reporting: Being deliberate about partnerships Shilling: Using partnerships for shilling and pumping the price I have added shilling for the sake of completeness but its senseless practice and I won’t discuss it any further here. Avoiding and reporting stand in contrast to each other; on the one side avoiding ensures an over-focus on price but lowers trust and transparency. Reporting, on the other side, although not intended, can lead to unexplainable price increases but helps the project to gain momentum. Although – as so often – the truth lies somewhere between avoiding and reporting I believe that we will see more granularity in the future (different news will be handled differently) and most importantly I take the view that in the long-run the crypto world will adopt best practices from the non-crypto industry.

Two mental steps towards cryptoasset diffusion

I see diffusion of cryptoassets as a two-step process where we move from one mental model to the other. These models are: Acceptance of cryptoassets in general: Initially, people must accept the concept of cryptoassets per se. Acceptance of one particular cryptoasset. Secondly, once people understand cryptoassets and believe they are better than whatever they replace, people must accept that one particular coin for that one particular use case. Currently, we are at step one. Today’s cryptos are thus confronted with two tasks; convince people that their general idea makes sense and convince people that their particular implementation (i.e. their crypto) makes sense. For both, especially the first, a lot of resilience is required and many won’t have that. More importantly, however, is that once we have crossed step one newcomers could come in with their new implementation and successfully process step two based on the work of the previous generation.