Tree hack - building and printing a tree

 

In which I made two snippets for my trees.

Background

Recently, I had to write a python script that transforms incoming data organizsed in a folder structure. For this, I needed both a simple python representation and an easy way to print the data.

Consider a project which collects sightings of Bigfoot, Nessie, and Yeti as .csv files for each day when there was a sighting. As seen in Figure 1, the data is organized into folders by cryptid and month.

Folder strucutre

Figure 1: Folder structure

In my terminal, I’d show the structure with tree or lsd. In python, I could use treelib. However, terminal tools don’t fit my actual needs, and for python, it feels excessive with a whole library for such a simple thing. Time to reinvent the wheel…

Building a tree from a directory

I python version 3.4 or later, pathlib is included, which can get directory contents by globbing. For instance:

list(Path("data/").glob("**/*.csv"))

gives

[PosixPath('data/nessie/2021-04/2021-04-03-sighting.csv'), PosixPath('data/nessie/2021-02/2021-02-21-sighting.csv'), ...

Note: this approach may fail for large directory trees, and not sure what happens if there are circular links in the tree. A simple way to get a tree representation without any classes is to use a dictionary {"root": children}. Here, children is another dictionary. For leaf nodes such as the .csv files, set the value to an empty dictionary.

{'data': {'nessie': {'2021-04': {'2021-04-03-sighting.csv': {}, ...} , ...}, ...}}

putting this together gives the following python snippet

from typing import Any, Dict
from pathlib import Path


def path_tree(path: Path, pattern: str = "**/*") -> Dict[str, Any]:
    """Build tree from path

    Example:
        path_tree(".", "**/*.csv")
        >>> {'directory': {'file1.csv': {}, 'sub_directory': {'file2.csv': {}}}}

    Args:
        path (Path): Path to start search from
        pattern (str, optional): Pattern to glob all reports

    Returns:
        Dict[str, Any]: Dict of directories and report files
    """
    tree: Dict[Any, Any] = {}
    for p in Path(path).glob(pattern):
        t = tree
        for q in p.relative_to(Path(path)).parts:
            t[q] = t.get(q, {})  # if q != p[-1] else True)
            t = t[q]
    return tree

Pretty printing a tree

True its batteries-included philosophy, python comes with a pretty-printer:

>>> pprint.pprint(a)
{'data': {'bigfoot': {'2021-01': {'2021-01-07-sighting.csv': {}},
                      '2021-03': {'2021-03-04-sighting.csv': {}, 
                      ....

This often is good enough. However, I want something more tree-like so let’s add a basic visualization using box-drawing characters. Here, I recursively loops through all dictionaries while remembering where to put . As with glob, for large and highly nested trees (or cyclic content) this could perform poorly - but for most uses, it’s perfect.

from typing import Any, Dict, List
def tree_to_str(tree: Dict[Any, Any], b: List[bool] = []) -> str:
    """Make string from tree

    Example:
        tree_to_str({'directory': {'file1.csv': {}, 'sub_directory': {'file2.csv': {}}}})
        >>> directory
            ├─ file1.csv
            └─ sub_directory
               └─ file2.csv

    Args:
        tree (Dict[Any, Any]): Dict with dicts
        b (List[bool], optional): stack to remember pipes

    Returns:
        str: String representation
    """
    s = ""
    for i, (k, v) in enumerate(tree.items()):
        s += (
            "".join(["" if q else "   " for q in b]) + (
                "" if i == len(tree) - 1 else "") + f"{k}\n"
        )
        b.append(i < len(tree) - 1)
        s += tree_to_str(v, b)
        b.pop()
    return s

Putting it all together

>>> print(str_tree(path_tree(".", "**/*.csv")))
└─ data
   ├─ nessie
     ├─ 2021-04
       └─ 2021-04-03-sighting.csv
     └─ 2021-02
        ├─ 2021-02-21-sighting.csv
        └─ 2021-02-01-sighting.csv
   ├─ bigfoot
     ├─ 2021-01
       └─ 2021-01-07-sighting.csv
     └─ 2021-03
        ├─ 2021-03-27-sighting.csv
        ├─ 2021-03-04-sighting.csv
        └─ 2021-03-07-sighting.csv
   └─ yeti
      └─ 2021-04
         ├─ 2021-04-29-sighting.csv
         └─ 2021-04-14-sighting.csv

Great! - just like the tree in the command line. And all with just in two simple functions. Like anything on this site, use these snippets as you please at your own risk, and feel free to credit me when you do it.


Header image: My crop of Photo by Cristina Gottardi on Unsplash