Header image credit1
In which I made two snippets for my trees.
Background
Recently, I had to write a python script that transforms incoming data organised in a folder structure. For this, I needed both a simple python representation and an easy way to print the data.
Consider a project which collects sightings of Bigfoot, Nessie,
and Yeti as .csv
files for each day when there was a sighting.
As seen in Figure 1, the data is organized into folders
by cryptid and month.
In my terminal, I’d show the structure with tree or lsd. In python, I could use treelib. However, terminal tools don’t fit my actual needs, and for python, it feels excessive with a whole library for such a simple thing. Time to reinvent the wheel…
Building a tree from a directory
I python version 3.4 or later, pathlib is included, which can get directory contents by globbing. For instance:
list(Path("data/").glob("**/*.csv"))
gives
[PosixPath('data/nessie/2021-04/2021-04-03-sighting.csv'), PosixPath('data/nessie/2021-02/2021-02-21-sighting.csv'), ...
Note: this approach may fail for large directory trees, and not sure what happens if
there are circular links in the tree. A simple way to get a tree representation without
any classes is to use a dictionary {"root": children}
. Here, children
is another
dictionary. For leaf nodes such as the .csv
files, set the value to an empty
dictionary.
{'data': {'nessie': {'2021-04': {'2021-04-03-sighting.csv': {}, ...} , ...}, ...}}
putting this together gives the following python snippet
from typing import Any, Dict
from pathlib import Path
def path_tree(path: Path, pattern: str = "**/*") -> Dict[str, Any]:
"""Build tree from path
Example:
path_tree(".", "**/*.csv")
>>> {'directory': {'file1.csv': {}, 'sub_directory': {'file2.csv': {}}}}
Args:
path (Path): Path to start search from
pattern (str, optional): Pattern to glob all reports
Returns:
Dict[str, Any]: Dict of directories and report files
"""
tree: Dict[Any, Any] = {}
for p in Path(path).glob(pattern):
t = tree
for q in p.relative_to(Path(path)).parts:
t[q] = t.get(q, {}) # if q != p[-1] else True)
t = t[q]
return tree
Pretty printing a tree
True its batteries-included philosophy, python comes with a pretty-printer:
>>> pprint.pprint(a)
{'data': {'bigfoot': {'2021-01': {'2021-01-07-sighting.csv': {}},
'2021-03': {'2021-03-04-sighting.csv': {},
....
This often is good enough. However, I want something more tree-like so let’s add a basic
visualization using
box-drawing characters. Here, I
recursively loops through all dictionaries while remembering where to put │
. As with
glob, for large and highly nested trees (or cyclic content) this could perform poorly -
but for most uses, it’s perfect.
from typing import Any, Dict, List
def tree_to_str(tree: Dict[Any, Any], b: List[bool] = []) -> str:
"""Make string from tree
Example:
tree_to_str({'directory': {'file1.csv': {}, 'sub_directory': {'file2.csv': {}}}})
>>> directory
├─ file1.csv
└─ sub_directory
└─ file2.csv
Args:
tree (Dict[Any, Any]): Dict with dicts
b (List[bool], optional): stack to remember pipes
Returns:
str: String representation
"""
s = ""
for i, (k, v) in enumerate(tree.items()):
s += (
"".join([" │" if q else " " for q in b]) + (
" └" if i == len(tree) - 1 else " ├") + f"─ {k}\n"
)
b.append(i < len(tree) - 1)
s += tree_to_str(v, b)
b.pop()
return s
Putting it all together
>>> print(str_tree(path_tree(".", "**/*.csv")))
└─ data
├─ nessie
│ ├─ 2021-04
│ │ └─ 2021-04-03-sighting.csv
│ └─ 2021-02
│ ├─ 2021-02-21-sighting.csv
│ └─ 2021-02-01-sighting.csv
├─ bigfoot
│ ├─ 2021-01
│ │ └─ 2021-01-07-sighting.csv
│ └─ 2021-03
│ ├─ 2021-03-27-sighting.csv
│ ├─ 2021-03-04-sighting.csv
│ └─ 2021-03-07-sighting.csv
└─ yeti
└─ 2021-04
├─ 2021-04-29-sighting.csv
└─ 2021-04-14-sighting.csv
Great! - just like the tree
in the command line. And all with just in two simple
functions. Like anything on this site, use these snippets as you please at your own
risk, and feel free to credit me when you do it.
-
My crop of Photo by Cristina Gottardi on Unsplash ↩