Writing good programs
Best practices in programming help you to do a more efficient and better science
…if experimentalists don’t calibrate their instruments, check the purity of the reagents, take careful notes, keep tubes clean, etc. what they’re doing isn’t considered science.
Why should we consider science rarely reproducible results of non tested programs?
Software carpentry
(best practices to write good programming projects)
The Software Carpentry Foundation is a non-profit volunteer organisation whose members teach researchers basic software skills
Software carpentry methods focuse on small-scale and immediately practical issues.
Become faster and more efficient
Share your work without pain
Collaborate more effectively
Trust your results
Reproduce your results
In other words…
Do a better science
Source Software Carpentry
Best practices to write better programs (or entire projects)
- Design the project upfront
- Write clean, transparent programs
- Document your programs
- Use data standards
Quality control - Use a version control system (repository)
- Test your programs
- Debug your code
Cycle of software development - Release your code
- Code review
Design the project upfront
Good scientists do not perform experiments before developing a hypothesis and planning materials and methods to test that hypothesis
Bioinformaticians often start coding with very little or no planning at all
Before the first line of code is written, software projects should be thoughtfully designed
Amfahr J, Bustamante A, Rome P. Exploring Agile: The Seapine Agile Expedition
How much design?
What will the program(s) do? and How will the results produced by the program be verified?
The most simple design documents describe inputs, how those inputs will be transformed by the program(s), and outputs.
Identifying the appropriate technologies or programming languages is a vital decision during the design phase
Techniques to write clean, transparent programs
- Break down your project into tasks
- Break up programming tasks into smaller steps
- Separate Input, Processing, and Output
- Use docstrings and comments
- Write smaller programs
- Format your source code
def read_fasta_files(directory):
'''
Reads a directory with many FASTA files containing protein sequences.
'''
pass
def filter_phe(sequences):
'''
Removes all sequences that do not have Phenylalanine or Phe in the name.
'''
pass
def remove_duplicates(sequences):
'''
Remove all sequence records, having an identical sequence.
'''
pass
def write_fasta(sequences, filename):
'''
Writes a single FASTA file with aaRS sequences.
'''
pass
if __name__ == '__main__':
INPUT_DIR = 'aars/'
OUTPUT_FILE = 'phe_filtered.fasta'
seq = read_fasta_files(INPUT_DIR)
filter_phe(seq)
remove_duplicates(seq)
write_fasta(seq, OUTPUT_FILE)
Write smaller programs
Simpler is better
When you can achieve your goals without using functions, do it!
When you can do something without complex data structures, do it!
When you can cut the size of your program by half, and it still does the job, do it!
Ultimately, the best program ever is one that gets the job done
with a single line of well-readable code.
Refactoring is the practice of taking small chunks of code (i.e., at the unit level) and frequently rebuilding it so that it is as simple as possible, yet still delivers the expected value.
Write programs in such a way that errors are easy to find
“Today I had a successful day at programming: I deleted 300 lines of code.” (Lorenzo Catucci)
Format your source code using coding guidelines
- Convention for naming variables, functions, classes, files
- Establishes where and how to write comments
- Excludes certain language constructs
- Enhance the readability of source code and avoid common coding errors
- Avoids overadjusting code to your personal preference
- Discourage using complex language constructs
Example: PEP8 coding guidelines and the pylint tool
Document your programs
One of the foundations of scientific research is the lab notebook, where materials, methods, and results are recorded so that experiments can be repeated.
Similarly, all computer programs should be well-documented, modular, and easy to read and follow even by users who did not write the program.
An acceptable level of documentation might provide:
- help through a user-guide
- information on how to compile and execute the program
- in-line comments describing program functions and modules
Amfahr J, Bustamante A, Rome P. Exploring Agile: The Seapine Agile Expedition
Use data standard
Input and output of your programs should comply community-accepted data standards
If these are not available
Store format information in the data itself
i.e. include documentation (metadata) describing the data
- Format of the data
- Definitions and assumption that allow interpreting the data
Use a quality control process to ensure reproducibility of results
Reproducibility requires three things:
Version control system (repository)
my_program.py
my_program2.py
my_program3.py
my_program3_optimised.py
my_program3_new.py
my_program3_new_new.py
A code repository (or version control) manages files that are being used and modified by different persons
- Keeps track of which is the most recent version of a particular file
- Notices whenever two people try to change the same version of a file
- Prevents changes that overwrite each other
- A project can be reverted to any earlier state
- Multiple independent subprojects can be managed simultaneously
A repository can include programs, data, planning documents, related publications
It is crucial for controlling exactly which version of a software, which set of input files, which parameters were used to produce a specific set of results and at what time
Examples of version control systems:
- Concurrent version system (CVS) (http://cvs.nongnu.org)
- Subversion (SVN) (http://subversion.apache.org)
- GIT (http://git-scm.com)
- Bazaar (http://bazaar.canonical.com/en/)
- Mercurial (http://mercurial.selenic.com/wiki/Mercurial )
hg init #keep track of files in the directory
hg add phe_sequences.py #add each file to the version control system explicitly
hg ci #commit after changes (describe briefly changes)
hg log aars_filter.py #return to an earlier version
hg up -r 23 #use the varsion number to update files
hg clone https://bitbucket.org/myproject/ #creates a repository
hg push #send the files to the server
hg pull #you can update changes from the server
hg up #to your computer
Test your code
“Code without test is broken by design” (Jacob Kaplan Moss)
- Manual tests
- Automated tests
- Sometimes, this is as simple as running it on a sample data you actually want to transform, and eyeballing the results
- Create small data files or artificial input data for testing
- In more complex situations, you may select subsets of your actual data, and make sure that they’re handled correctly.
Back
Back to main page.