chore: improve providence docs (#2408)

* chore: improve providence docs * Apply suggestions from code review
2024-11-12 17:10:06 +01:00 · 2024-11-12 17:10:06 +01:00 · a19096e703
commit a19096e703
parent faca1916fc
11 changed files with 48 additions and 1636 deletions
--- a/.changeset/unlucky-rabbits-appear.md
+++ b/.changeset/unlucky-rabbits-appear.md
@ -0,0 +1,5 @@
 ---
 'providence-analytics': patch
 ---
 improve docs
--- a/docs/fundamentals/node-tools/providence-analytics/LocalConfiguration.md
+++ b/docs/fundamentals/node-tools/providence-analytics/LocalConfiguration.md
@ -1,68 +0,0 @@
 # Node Tools >> Providence Analytics >> Local configuration ||40
 The Providence configuration file is read by providence cli (optional) and by the dashboard (required).
 It has a few requirements:
 - it must be called `providence.conf.js` or `providence.conf.mjs`
 - it must be in ESM format
 - it must be located in the root of a repository (under `process.cwd()`)
 ## Meta data
 ### Category info
 Based on the filePath of a result, a category can be added.
 For example:
 ```js
 export default {
  metaConfig: {
    categoryConfig: [
      {
        // This is the name found in package.json
        project: '@lion/ui/root.js',
        // These conditions will be run on overy filePath
        categories: {
          core: p => p.startsWith('./packages/core'),
          utils: p => p.startsWith('./packages/ajax') || p.startsWith('./packages/localize'),
          overlays: p =>
            p.startsWith('./packages/overlays') ||
            p.startsWith('./packages/dialog') ||
            p.startsWith('./packages/tooltip'),
          ...
        },
      },
    ],
  },
 }
 ```
 > N.B. category info is regarded as subjective, therefore it's advised to move this away from
 > Analyzers (and thus file-system cache). Categories can be added realtime in the dashboard.
 ## Project paths
 ### referenceCollections
 A list of file system paths. They can be defined relative from the current project root or they can be full paths.
 When a [MatchAnalyzer](../../../fundamentals/node-tools/providence-analytics/analyzer.md) like `match-imports` or `match-subclasses` is used, the default reference(s) can be configured here. For instance: ['/path/to/@lion/ui/form.js']
 An example:
 ```js
  referenceCollections: {
    // Our products
    'lion-based-ui': [
      './providence-input-data/references/lion-based-ui',
      './providence-input-data/references/lion-based-ui-labs',
    ],
    ...
  }
 ```
 ### searchTargetCollections
 A list of file system paths. They can be defined relative from the current project root
 or they can be full paths.
 When not defined, the current project will be the search target (this is most common when
 providence is used as a dev dependency).
--- a/docs/fundamentals/node-tools/providence-analytics/QueryResult.md
+++ b/docs/fundamentals/node-tools/providence-analytics/QueryResult.md
@ -1,111 +0,0 @@
 # Node Tools >> Providence Analytics >> QueryResult ||50
 When an Analyzer has run, it returns a QueryResult. This is a json object that contains all
 meta info (mainly configuration parameters) and the query output.
 A QueryResult always contains the analysis of one project (a target project). Optionally,
 it can contain a reference project as well.
 ## Anatomy
 A QueryResult starts with a meta section, followed by the actual results
 ### Meta
 The meta section lists all configuration options the analyzer was run with. Here, you see an
 example of a `find-imports` QueryResult:
 ```js
  "meta": {
    "searchType": "ast-analyzer",
    "analyzerMeta": {
      "name": "find-imports",
      "requiredAst": "babel",
      "identifier": "importing-target-project_0.0.2-target-mock__1970011674",
      "targetProject": {
        "name": "importing-target-project",
        "commitHash": "3e5014d6ecdff1fc71138cdb29aaf7bf367588f5",
        "version": "0.0.2-target-mock"
      },
      "configuration": {
        "keepInternalSources": false
      }
    }
  },
 ```
 ### Output
 The output is usually more specifically tied to the Analyzer. What most regular Analyzers
 (not being MatchAnalyzers that require a referenceProjectPath) have in common, is that their
 results are being shown per "entry" (an entry corresponds with an AST generated by Babel, which in
 turn corresponds to a file found in a target or reference project).
 Below an example is shown of `find-imports` QueryOutput:
 ```js
  "queryOutput": [
    {
      "project": {
        "name": "importing-target-project",
        "mainEntry": "./target-src/match-imports/root-level-imports.js",
        "version": "0.0.2-target-mock",
        "commitHash": "3e5014d6ecdff1fc71138cdb29aaf7bf367588f5"
      },
      "entries": [
        {
          "file": "./target-src/find-imports/all-notations.js",
          "result": [
            {
              "importSpecifiers": [
                "[file]"
              ],
              "source": "imported/source",
              "normalizedSource": "imported/source",
              "fullSource": "imported/source"
            },
            {
              "importSpecifiers": [
                "[default]"
              ],
              "source": "imported/source-a",
              "normalizedSource": "imported/source-a",
              "fullSource": "imported/source-a"
            },
            ...
 ```
 MatchAnalyzers usually do post processing on the entries. The output below (for the `match-imports`
 Analyzer) shows an ordering by matched specifier.
 ```js
 "queryOutput": [
    {
      "exportSpecifier": {
        "name": "[default]",
        "project": "exporting-ref-project",
        "filePath": "./index.js",
        "id": "[default]::./index.js::exporting-ref-project"
      },
      "matchesPerProject": [
        {
          "project": "importing-target-project",
          "files": [
            "./target-src/match-imports/root-level-imports.js",
            "./target-src/match-subclasses/internalProxy.js"
          ]
        }
      ]
    },
    ...
 ```
 Due to some legacy decisions, the QueryOutput allows for multiple target- and reference projects.
 Aggregation of data now takes place in the dashboard.
 QueryOutputs always contain one or a combination of two projects. This means that the
 QueryOutput structure could be simplified in the future.
 ## Environment agnosticism
 The output files stored in the file system always need to be machine independent:
 this means that all machine specific information, like a complete filepath, needs to be removed from a QueryOutput (paths relative from project root are still allowed).
 In that way, the caching mechanism (based on hash comparisons) as described in [Analyzer](../../../fundamentals/node-tools/providence-analytics/analyzer.md) is guaranteed to work across different machines.
--- a/docs/fundamentals/node-tools/providence-analytics/analyzer.md
+++ b/docs/fundamentals/node-tools/providence-analytics/analyzer.md
@ -1,69 +0,0 @@
 # Node Tools >> Providence Analytics >> Analyzer ||20
 Analyzers form the core of Providence. They contain predefined queries based on AST traversal/analysis.
 A few examples are:
 - find-imports
 - find-exports
 - match-imports
 An analyzer will give back a [QueryResult](../../../fundamentals/node-tools/providence-analytics/QueryResult.md) that will be written to the file system by Providence.
 All analyzers need to extend from the `Analyzer` base class, found in `src/program/analyzers/helpers`.
 ## Public api
 Providence has the following configuration api:
 - name (string)
 - requiresReference (boolean)
  An analyzer will always need a targetProjectPath and can optionally have a referenceProjectPath.
  In the latter case, it needs to have `requiresReference: true` configured.
 During AST traversal, the following api can be consulted
 - `.targetData`
 - `.referenceData`
 - `.identifier`
 ## Phases
 ### Prepare phase
 In this phase, all preparations will be done to run the analysis. Providence is designed to be performant and therefore will first look if it finds an already existing, cached result for the current setup.
 ### Traverse phase
 The ASTs are created for all projects involved and the data are extracted into a QueryOutput. This output can optionally be post processed.
 ### Finalize phase
 The data are normalized and written to the filesystem in JSON format.
 ## Targets and references
 Every Analyzer needs a targetProjectPath. A targetProjectPath is a file path String that.
 ## Types
 We can roughly distinguish two types of analyzers: those that require a reference and those that don't require a reference.
 ## Database
 In order to share data across multiple machines, results are written to the filesystem in a
 "machine agnostic" way. They can be shared through git and serve as a local database.
 ### Caching
 In order to make caching possible, Providence creates an "identifier": a hash from the combination of project versions + Analyzer configuration. When an identifier already exists in the filesystem, the result can be read from cache. This increases performance and helps mitigate memory problems that can occur when handling large amounts of data in a batch.
 ## Analyzer helpers
 Inside the folder './src/program/analyzers', a folder 'helpers' is found.
 Helpers are created specifically for use within analyzers and have knowledge about
 the context of the analyzer (knowledge about an AST and/or QueryResult structure).
 Generic functionality (that can be applied in any context) can be found in './src/program/utils'.
 ## Post processors
 Post processors are imported by analyzers and act on their outputs. They can be enabled via the configuration of an analyzer. They can be found in './src/program/analyzers/post-processors'. For instance: transform the output of analyzer 'find-imports' by sorting on specifier instead of the default (entry). Other than most configurations of analyzers, post processors act on the total result of all analyzed files instead of just one file/ ast entry.
--- a/docs/fundamentals/node-tools/providence-analytics/assets/_mermaid.svg.js
+++ b/docs/fundamentals/node-tools/providence-analytics/assets/_mermaid.svg.js
--- a/docs/fundamentals/node-tools/providence-analytics/assets/analyzer-query.gif
+++ b/docs/fundamentals/node-tools/providence-analytics/assets/analyzer-query.gif
--- a/docs/fundamentals/node-tools/providence-analytics/assets/feature-query.gif
+++ b/docs/fundamentals/node-tools/providence-analytics/assets/feature-query.gif
--- a/docs/fundamentals/node-tools/providence-analytics/assets/provicli.gif
+++ b/docs/fundamentals/node-tools/providence-analytics/assets/provicli.gif
--- a/docs/fundamentals/node-tools/providence-analytics/assets/providash.gif
+++ b/docs/fundamentals/node-tools/providence-analytics/assets/providash.gif
--- a/docs/fundamentals/node-tools/providence-analytics/dashboard.md
+++ b/docs/fundamentals/node-tools/providence-analytics/dashboard.md
@ -1,22 +0,0 @@
 # Node Tools >> Providence Analytics >> Dashboard ||30
 An interactive overview of all aggregated [QueryResults](../../../fundamentals/node-tools/providence-analytics/QueryResult.md) can be found in the dashboard.
 The dashboard is a small nodejs server (based on es-dev-server + middleware) and a frontend
 application.
 ## Run
 Start the dashboard via `providence dashboard` to automatically open the browser and start the dashboard.
 ## Interface
 - Select all reference projects
 - Select all target projects
 ### Generate csv
 When `get csv` is pressed, a `.csv` will be downloaded that can be loaded into Excel.
 ## Analyzer support
 Currently, `match-imports` and `match-subclasses` are supported, more analyzers will be added in the future.
--- a/docs/fundamentals/node-tools/providence-analytics/overview.md
+++ b/docs/fundamentals/node-tools/providence-analytics/overview.md
@ -2,7 +2,6 @@
 ```js script
 import { html } from '@mdjs/mdjs-preview';
 import { providenceFlowSvg, providenceInternalFlowSvg } from './assets/_mermaid.svg.js';
 ```
 Providence is the 'All Seeing Eye' that generates usage statistics by analyzing code.
@ -10,8 +9,6 @@ It measures the effectivity and popularity of your software.
 With just a few commands you can measure the impact for (breaking) changes, making
 your release process more stable and predictable.
 Providence can be used as a dev dependency in a project for which metrics
 can be generated via analyzers (see below).
 For instance for a repo "lion-based-ui" that extends @lion/\* we can answer questions like:
 - **Which subsets of my product are popular?**
@ -23,170 +20,76 @@ For instance for a repo "lion-based-ui" that extends @lion/\* we can answer ques
 - etc...
-All the above results can be shown in a dashboard (see below), which allows to sort exports from reference project (@lion) based on popularity, category, consumer etc.
+Providence uses abstract syntax trees (ASTs) to have the most advanced analysis possible.
-The dashboard allows to aggregate data from many target projects as well and will show you on a detailed (file) level how those components are being consumed by which projects.
+It does this via the [oxc parser](https://oxc.rs/docs/guide/usage/parser.html), the quickest parser available today!
-## Setup
+## Run
-### Install providence
+Providence expects an analyzer name that tells it what type of analysis to run:
 ```bash
-npm i --save-dev providence-analytics
+npx providence analyze <analyzer-name>
 ```
-### Add a providence script to package.json
+By default Providence ships these analyzers:
 ```json
 "scripts": {
  "providence:match-imports": "providence analyze match-imports -r 'node_modules/@lion/ui/*.js'",
 }
 ```
 > The example above illustrates how to run the "match-imports" analyzer for reference project 'lion-based-ui'. Note that it is possible to run other analyzers and configurations supported by providence as well. For a full overview of cli options, run `npx providence --help`. All supported analyzers will be viewed when running `npx providence analyze`
 You are now ready to use providence in your project. All
 data will be stored in json files in the folder `./providence-output`
 ![CLI](./assets/provicli.gif 'CLI')
 ## Setup: Dashboard
 ### Add "providence:dashboard" script to package.json
 ```js
 ...
 "scripts": {
    ...
    "providence:dashboard": "providence dashboard"
 }
 ```
 ### Add providence.conf.js
 ```js
 export default {
  referenceCollections: {
    'lion-based-ui-collection': [
      './node_modules/lion-based-ui/packages/x',
      './node_modules/lion-based-ui/packages/y',
    ],
  },
 };
 ```
 Run `npm run providence:dashboard`
 ![dashboard](./assets/providash.gif 'dashboard')
 ## Setup: about result output
 All output files will be stored in `./providence-output`.
 This means they will be committed to git, so your colleagues don't have to
 rerun the analysis (for large projects with many dependencies this can be time consuming)
 and can directly start the dashboard usage metrics.
 Also, note that the files serve as cache (they are stored with hashes based on project version and analyzer configuration). This means that an interrupted analysis can be
 resumed later on.
 ## Conceptual overview
 Providence performs queries on one or more search targets.
 These search targets consist of one or more software projects (javascript/html repositories)
 The diagram below shows how `providenceMain` function can be used from an external context.
 ```js story
 export const providenceFlow = () => providenceFlowSvg;
 ```
 ## Flow inside providence
 The diagram below depicts the flow inside the `providenceMain` function.
 It uses:
 - InputDataService
  Used to create a data structure based on a folder (for instance the search target or
  the references root). The structure creates entries for every file, which get enriched with code,
  ast results, query results etc. Returns `InputData` object.
 - QueryService
  Requires a `queryConfig` and `InputData` object. It will perform a query (grep search or ast analysis)
  and returns a `QueryResult`.
  It also contains helpers for the creation of a `queryConfig`
 - ReportService
  The result gets outputted to the user. Currently, a log to the console and/or a dump to a json file
  are available as output formats.
 ```js story
 export const providenceInternalFlow = () => providenceInternalFlowSvg;
 ```
 ## Queries
 Providence requires a queries as input.
 Queries are defined as objects and can be of two types:
 - feature-query
 - ast-analyzer
 A `QueryConfig` is required as input to run the `providenceMain` function.
 This object specifies the type of query and contains the relevant meta
 information that will later be outputted in the `QueryResult` (the JSON object that
 the `providenceMain` function returns.)
 ## Analyzer Query
 Analyzer queries are also created via `QueryConfig`s.
 Analyzers can be described as predefined queries that use AST traversal.
 Run:
 ```bash
 providence analyze
 ```
 Now you will get a list of all predefined analyzers:
 - find-imports
 - find-exports
 - find-classes
 - match-imports
 - match-subclasses
 - etc...
-![Analyzer query](./assets/analyzer-query.gif 'Analyzer query')
+Let's say we run `find-imports`:
 <!--
 ## Running providence from its own repo
 ### How to add a new search target project
 ```bash
-git submodule add <git-url> ./providence-input-data/search-targets/<project-name>
+npx providence analyze find-imports
 ```
-### How to add a reference project
+Now it retrieves all relevant data about es module imports.
 There are plenty of edge cases that it needs to take into account here;
 you can have a look at the tests to get an idea about all different cases Providence handles for you.
-By adding a reference project, you can automatically see how code in your reference project is
+## Projects
-used across the search target projects.
+
-Under the hood, this automatically creates a set of queries for you.
+Providence uses the concept of projects. A project is a piece of software to analyze:
 usually an npm artifact or a git (mono-)repository. What all projects have in common,
 is a package.json. From it, the following project data is derived:
 - the name
 - the version
 - the files it uses for scanning. One of the following strategies is usually followed:
  - exportmap entrypoints (by 'expanding' package.json "exports" on file system)
  - npm files (it reads package.json "files" | .npmignore)
  - the git files (it reads .gitignore)
  - a custom defined list
 For a "find" analyzer, there is one project involved (the target project).
 We can specify it like this (we override the default current working directory):
 ```bash
-git submodule add <git-url> ./providence-input-data/references/<project-name>
+npx providence analyze find-imports -t /importing/project
 ```
-### Updating submodules
+For a "match" analyzer, there is also a reference project.
-
+Here we match the exports of the reference project (-r) against the imports of the target project (-t).
 Please run:
 ```bash
-git submodule update --init --recursive
+npx providence analyze match-imports -t /importing/project -r /exporting/project
 ```
-### Removing submodules
+## Utils
-Please run:
+Providence comes with many tools for deep traversal of identifiers,
 the (babel like) traversal of ast trees in oxc and swc and more.
 Also more generic utils for caching and performant globbing come delivered with Providence.
 For a better understanding, check out the utils folders (tests and code).
 ## More
 For more options, see:
 ```bash
-sh ./rm-submodule.sh <path/to/submodule>
+npx providence --help
 ```
 -->