Arrange, Search, and Again Up Information with Python’s Pathlib

Date:

Share post:


Picture by Writer

 

Python’s built-in pathlib module makes working with filesystem paths tremendous easy. In How To Navigate the Filesystem with Python’s Pathlib, we appeared on the fundamentals of working with path objects and navigating the filesystem. It’s time to go additional.

Our High 5 Free Course Suggestions

googtoplist 1. Google Cybersecurity Certificates – Get on the quick monitor to a profession in cybersecurity.

Screenshot 2024 08 19 at 3.11.35 PM e1724094769639 2. Pure Language Processing in TensorFlow – Construct NLP techniques

michtoplist e1724091873826 3. Python for All people – Develop applications to assemble, clear, analyze, and visualize information

googtoplist 4. Google IT Help Skilled Certificates

awstoplist 5. AWS Cloud Options Architect – Skilled Certificates

On this tutorial, we’ll go over three particular file administration duties utilizing the capabilities of the pathlib module:

  • Organizing information by extension
  • Trying to find particular information
  • Backing up necessary information

By the tip of this tutorial, you will have realized the best way to use pathlib for file administration duties. Let’s get began!

 

1. Arrange Information by Extension

 

Once you’re researching for and dealing on a undertaking, you’ll usually create advert hoc information and obtain associated paperwork into your working listing till it is a litter, and you should manage it.

Let’s take a easy instance the place the undertaking listing comprises necessities.txt, config information and Python scripts. We’d wish to type the information into subdirectories—one for every extension. For comfort, let’s select the extensions because the identify of the subdirectories.

 

organize-files
Arrange Information by Extension | Picture by Writer

 

Right here’s a Python script that scans a listing, identifies information by their extensions, and strikes them into respective subdirectories:

# manage.py

from pathlib import Path

def organize_files_by_extension(path_to_dir):
    path = Path(path_to_dir).expanduser().resolve()
    print(f"Resolved path: {path}")

    if path.exists() and path.is_dir():
        print(f"The directory {path} exists. Proceeding with file organization...")
   	 
    for merchandise in path.iterdir():
        print(f"Found item: {item}")
        if merchandise.is_file():
            extension = merchandise.suffix.decrease()
            target_dir = path / extension[1:]  # Take away the main dot

            # Make sure the goal listing exists
            target_dir.mkdir(exist_ok=True)
            new_path = target_dir / merchandise.identify

            # Transfer the file
            merchandise.rename(new_path)

            # Verify if the file has been moved
            if new_path.exists():
                print(f"Successfully moved {item} to {new_path}")
            else:
                print(f"Failed to move {item} to {new_path}")

	  else:
       print(f"Error: {path} does not exist or is not a directory.")

organize_files_by_extension('new_project')

 

The organize_files_by_extension() perform takes a listing path as enter, resolves it to an absolute path, and organizes the information inside that listing by their file extensions. It first ensures that the required path exists and is a listing.

Then, it iterates over all objects within the listing. For every file, it retrieves the file extension, creates a brand new listing named after the extension (if it does not exist already), and strikes the file into this new listing.

After shifting every file, it confirms the success of the operation by checking the existence of the file within the new location. If the required path doesn’t exist or shouldn’t be a listing, it prints an error message.

Right here’s the output for the instance perform name (organizing information within the new_project listing):

 
organize
 

Now do that on a undertaking listing in your working setting. I’ve used if-else to account for errors. However you may as effectively use try-except blocks to make this model higher.

 

2. Seek for Particular Information

 

Generally you could not need to manage the information by their extension into completely different subdirectories as with the earlier instance. However you could solely need to discover all information with a particular extension (like all picture information), and for this you should use globbing.

Say we need to discover the necessities.txt file to have a look at the undertaking’s dependencies. Let’s use the identical instance however after grouping the information into subdirectories by the extension.

In case you use the glob() methodology on the trail object as proven to search out all textual content information (outlined by the sample ‘*.txt’), you’ll see that it does not discover the textual content file:

# search.py
from pathlib import Path

def search_and_process_text_files(listing):
    path = Path(listing)
    path = path.resolve()
    for text_file in path.glob('*.txt'):
    # course of textual content information as wanted
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

It is because glob() solely searches the present listing, which doesn’t include the necessities.txt file.The necessities.txt file is within the txt subdirectory. So it’s a must to use recursive globbing with the rglob() methodology as a substitute.

So right here’s the code to search out the textual content information and print out their contents:

from pathlib import Path

def search_and_process_text_files(listing):
    path = Path(listing)
    path = path.resolve()
    for text_file in path.rglob('*.txt'):
    # course of textual content information as wanted
        print(f'Processing {text_file}...')
        print(text_file.read_text())

search_and_process_text_files('new_project')

 

The search_and_process_text_files perform takes a listing path as enter, resolves it to an absolute path, and searches for all .txt information inside that listing and its subdirectories utilizing the rglob() methodology.

For every textual content file discovered, it prints the file’s path after which reads and prints out the file’s contents. This perform is helpful for recursively finding and processing all textual content information inside a specified listing.

As a result of necessities.txt is the one textual content file in our instance, we get the next output:

Output >>>
Processing /residence/balapriya/new_project/txt/necessities.txt...
psycopg2==2.9.0
scikit-learn==1.5.0

 

Now that you know the way to make use of globbing and recursive globbing, attempt to redo the primary activity—organizing information by extension—utilizing globbing to search out and group the information after which transfer them to the goal subdirectory.

 

3. Again Up Essential Information

 

Organizing information by the extension and looking for particular information are the examples we’ve seen so far. However how about backing up sure necessary information, as a result of why not?

Right here we’d like to repeat information from the undertaking listing right into a backup listing fairly than transfer the file to a different location. Along with pathlib, we’ll additionally use the shutil module’s copy perform.

Let’s create a perform that copies all information with a particular extension (all .py information) to a backup listing:

#back_up.py
import shutil
from pathlib import Path

def back_up_files(listing, backup_directory):
    path = Path(listing)
    backup_path = Path(backup_directory)
    backup_path.mkdir(mother and father=True, exist_ok=True)

    for important_file in path.rglob('*.py'):
        shutil.copy(important_file, backup_path / important_file.identify)
        print(f'Backed up {important_file} to {backup_path}')


back_up_files('new_project', 'backup')

 

The back_up_files() takes in an present listing path and a backup listing path perform and backs up all Python information from a specified listing and its subdirectories into a delegated backup listing.

It creates path objects for each the supply listing and the backup listing, and ensures that the backup listing exists by creating it and any essential guardian directories if they don’t exist already.

The perform then iterates by all .py information within the supply listing utilizing the rglob() methodology. For every Python file discovered, it copies the file to the backup listing whereas retaining the unique filename. Basically, this perform helps in making a backup of all Python information inside a undertaking listing

After operating the script and verifying the output, you may all the time test the contents of the backup listing:

 
backup
 

In your instance listing, you should use back_up_files('/path/to/listing', '/path/to/backup/listing') to again up information of curiosity.

 

Wrapping Up

 

On this tutorial, we have explored sensible examples of utilizing Python’s pathlib module to arrange information by extension, seek for particular information, and backup necessary information. You could find all of the code used on this tutorial on GitHub.

As you may see, the pathlib module makes working with file paths and file administration duties simpler and extra environment friendly. Now, go forward and apply these ideas in your personal initiatives to deal with your file administration duties higher. Glad coding!

 

 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and occasional! At the moment, she’s engaged on studying and sharing her information with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.

Related articles

How Google Outranks Medium.com Plagiarized Content material Forward of Unique Content material

This strategy continues as we speak, strengthened by new algorithmic modifications within the Useful Content material Replace, designed...

Dr. Mehdi Asghari, President & CEO of SiLC Applied sciences – Interview Collection

Mehdi Asghari is presently the President & Chief Government Officer at SiLC Applied sciences, Inc. Previous to this,...

The Intersection of AI and IoT: Creating Smarter Linked Environments – AI Time Journal

The mix of Synthetic intelligence and the Web of Issues (IoT) contributed to create good units with the...