Recently I read this article (Cloudera adds data engineering, visualization to its Data Platform) and wondered why it was relevant. To an industry-expert the implications are clear, but to me – an industry-novice – not. This situation made me wonder how a knowledge graph can be used to automatically answer the “why” behind a company’s decision. Based on an innovation process in patent analysis, I came up with this process: I manually identified a similar article (Tech and Antitrust Follow-up, Google Buys Looker, Salesforce Buys Tableau; paywall) From this article, I constructed the knowledge graph below. I defined explicit relationships (as mentioned in the article) and implicit relationships (in green; these relationships can be inferred from an external knowledge source like Wikipedia) The knowledge graph depicts the following points: Google’s biggest competitors (Microsoft and Amazon) own tools in the data analytics and visualization segment (Looker’s segment) the data analytics and visualization segment is important to Google because that segment is part of the “Big Data movement” (implicit relationship) and “Big Data” is important to Google (implicit relationship) Google does not own any tools in the analytics segment Based on this information, it makes sense that Google acquired a tool in
Month: October 2020
Is ai to Tableau what vlookup is to excel?
As Ben Thompson from Stratechery wrote on Google’s acquistion of Looker “data analytics and visualization is a large and growing segment in enterprise software”. As Boris Evelson from Forrester points out, BI tools have reached technological maturity in certain areas such as d”atabase connectivity and data ingestion, security, data visualization, and slice-and-dice OLAP capabilities”. At the same time he points out the lack of demand: Fifty-six percent of global data and analytics decision makers (seniority level of manager or above) say their firms are currently in the beginner stage of their insights-driven transformation. Further anecdotal evidence shows that enterprises use no more than 20% of their data for insights, and less than 20% of knowledge workers use enterprise BI applications, still preferring spreadsheets and other shadow IT approaches. The reasons are – as he points out – “the low maturity of the people/process/data”. BI-vendors are trying to solve this issue by extending their solutions into E2E-tools; Considers Pentaho’s integration with Lumada: Lumada’s focus is on covering the entire data lifecycle, from the integration of various data sources to the evaluation of video and IoT data in compliance with DSGVO regulations and their deployment in self-service applications. Pentaho’s plans for its
Data loading processing in the data warehouse to handle deletes
When you are populating your data vault, you might need to delete you stage-tables in an asynchronous way; load -> staging -> integration layer Only – and only – when you have populated the integration layer, you can delete the entries from your load table. One way to achieve this is to implement a delete-tracking-table that will track your deletes. The process is like this: Set up a metadata-table that contains: your target table and its source table After populating a table in the integration layer, you store this information in the delete-tracking-table. Concretely you track: the source table, the table in the integration layer, and the highest load date in your table in the integration layer Initiate the delete-process: for each source table defined in your metadata-table get the lowest load date from the delete-tracking-table. If there is no entry in your delete-tracking-table, use 1753 as a default. Delete every entry from your source table that is lower than this load date.