This guide provides comprehensive standards for maintaining PEP 8 compliance across the ZOL codebase. It covers manual practices to ensure consistent, readable, and maintainable Python code.
- Code Style Standards
- File Organization
- Naming Conventions
- Documentation Standards
- Common Issues and Fixes
- Manual Code Review Checklist
- Best Practices Summary
- Use 4 spaces per indentation level
- Never use tabs
- Maximum line length: 88 characters
# ✅ Correct
def long_function_name(
var_one,
var_two,
var_three,
):
print(var_one)
# ❌ Incorrect
def long_function_name(var_one, var_two, var_three):
print(var_one)- Group imports in this order:
- Standard library imports
- Third-party imports
- Local application imports
- Use absolute imports when possible
# ✅ Correct
import os
import sys
from pathlib import Path
from typing import Dict, List, Optional
import numpy as np
import pandas as pd
from Bio import SeqIO
from zol import util
from zol.orthologs import findOrthologs- Prefer f-strings over other formatting methods
- Use
.format()for complex formatting - Avoid
%formatting
# ✅ Preferred
name = "World"
print(f"Hello, {name}!")
# ✅ For complex cases
print("Value: {:.2f}".format(3.14159))
# ❌ Avoid
print("Hello, %s!" % name)- Use
snake_casefor variables and functions - Use
UPPER_CASEfor constants - Use
PascalCasefor classes
# ✅ Correct
def calculate_gene_cluster_score():
MAX_SCORE = 100
gene_cluster_name = "cluster_001"
class GeneClusterAnalyzer:
pass- Use type hints for function parameters and return values
- Import types from
typingmodule
# ✅ Correct
from typing import Dict, List, Optional, Union
def process_genbank_file(
file_path: str,
output_dir: Optional[str] = None,
) -> Dict[str, List[str]]:
"""Process GenBank file and return results."""
passsrc/zol/
├── __init__.py
├── data_dictionary.py
├── fai.py
├── util.py
├── zol.py
└── orthologs/
├── __init__.py
├── findOrthologs.py
└── ...
Each Python file should follow this structure:
- Module docstring
- Imports
- Constants
- Classes
- Functions
- Main execution block (if applicable)
"""
Module for processing GenBank files.
This module provides utilities for parsing and analyzing GenBank format files
used in the ZOL pipeline.
"""
import os
from pathlib import Path
from typing import Dict, List, Optional
import numpy as np
from Bio import SeqIO
# Constants
DEFAULT_THREADS = 1
MAX_MEMORY_GB = 16
# Classes
class GenBankProcessor:
"""Process GenBank files for analysis."""
def __init__(self, input_dir: str):
self.input_dir = Path(input_dir)
# Functions
def parse_genbank_file(file_path: str) -> Dict[str, str]:
"""Parse a GenBank file and return sequence data."""
pass
# Main execution
if __name__ == "__main__":
main()# ✅ Good names
gene_cluster_count = 0
calculate_ortholog_groups()
process_diamond_results()
# ❌ Poor names
gc_count = 0
calc_og()# ✅ Constants in UPPER_CASE
DEFAULT_THREADS = 1
MAX_MEMORY_GB = 16
DIAMOND_SENSITIVITY = "very-sensitive"# ✅ PascalCase for classes
class GeneClusterAnalyzer:
pass
class OrthologGroupFinder:
passUse Google-style docstrings for all public functions and classes:
def process_genbank_files(
input_directory: str,
output_directory: str,
threads: int = 1,
) -> Dict[str, List[str]]:
"""Process GenBank files in the input directory.
Args:
input_directory: Path to directory containing GenBank files.
output_directory: Path to directory for output files.
threads: Number of threads to use for processing.
Returns:
Dictionary mapping file names to processed results.
Raises:
FileNotFoundError: If input directory doesn't exist.
ValueError: If threads is less than 1.
"""
pass- Use comments sparingly
- Explain "why" not "what"
- Keep comments up to date
# ✅ Good comment
# Skip processing if no valid files found
if not valid_files:
return
# ❌ Poor comment
# Set x to 5
x = 5# ❌ Mixed tabs and spaces
def function():
# This line uses tabs
# This line uses spaces
pass
# ✅ Consistent spaces
def function():
# All lines use 4 spaces
pass# ❌ Too long
result = very_long_function_name_with_many_parameters(param1, param2, param3, param4, param5, param6, param7, param8)
# ✅ Properly formatted
result = very_long_function_name_with_many_parameters(
param1,
param2,
param3,
param4,
param5,
param6,
param7,
param8,
)# ❌ Wildcard imports
from zol import *
# ✅ Specific imports
from zol import util, fai
from zol.orthologs import findOrthologs# ❌ Old Path object usage (if removing pathlib support)
file_path = Path("dir") / "file.txt"
# ✅ String concatenation
file_path = "dir" + "file.txt"
# ✅ os.path.join (alternative)
import os
file_path = os.path.join("dir", "file.txt")Before committing code, ensure:
- Code follows PEP 8 style guidelines
- All imports are properly organized
- Type hints are used where appropriate
- Docstrings are present for public functions
- Variable names are descriptive and follow conventions
- No hardcoded paths or magic numbers
- Error handling is appropriate
- No unused imports or variables
- Line length is under 88 characters
- Proper indentation (4 spaces, no tabs)
- Follow naming conventions consistently
- Write clear docstrings for all public functions
- Use type hints to improve code clarity
- Keep functions small and focused
- Use meaningful variable names
- Handle errors appropriately
- Test your code thoroughly
- Review code before committing
- Keep dependencies updated
Remember: The goal is readable, maintainable code. When in doubt, prioritize clarity over brevity.