Changing the LibreOffice bibliography database

Goal: replace or update the default bibio database in LibreOffice with a user defined database.

For citations and building a bibliography in LibreOffice Write, please see this previous post

  1. Find the default bibiography database. On my Mac, the LibreOffice bibliography database, biblio.odb, is located at [username]/Library/Application Support/LibreOffice/4/user/database/
  2. rename the file, e.g., default_biblio.odb
  3. Export database from JabRef. I tried several options, the best results I had was saving the database as sxc spreadsheet, the LibreOffice format.
  4. Import the spreadsheet file into LibreOffice Base and save as a odb database.
  5. File → New → Database
  6. which brings up the Database Wizard. Select Connect to an existing database, and select “Spreadsheet” from the drop-down menu.
  7. Click “Next” to continue. Browse to find the spreadsheet file, then press Next again. Accept the defaults (Yes, register the database for me, Open the database for editing), and click Finish to save the database.
  8. LibreOffice Base will open and display your references. Make changes as needed. Then quit base and proceed to connect the exported database so that it will
  9. LibreOffice → Preferences→ LibreOffice→ Base→ Databases.
  10. Click on “New” and browse to the location of your saved bibliography.
  11. Then, update the entry in Registered name so that it reads “Bibliography”
  12. Click OK. If all goes well, when you access references in LibreOffice Write via the Tools → Bibliography database command, then your bibliography will be present.

JabRef citations and bibliography in LibreOffice

This post is about citations and bibliography in LibreOffice Write. For citations in Word, see this post. For instructions for updating the bibliography database in LibreOffice, see this post.

JabRef (ver 3.8.2) & LibreOffice Write (version 5.3.3.2)

This method uses the jabref plugin for OpenOffice/LibreOffice Write

  1. Must start java. LibreOffice → Preferences → Advanced. Provided you have a runtime environment installed (JRE), it will show in the popup window. Simply click on the button, then OK to save the changes. Restart LibreOffice before proceeding.
  2. Start LibreOffice and write your manuscript. Keep your document in the odt format native to LibreOffice.
  3. Return to Jabref, start the OpenOffice/LibreOffice connection: Tools → OpenOffice/LiberOffice connection. The tool will appear along the left-hand side.
  4. Next, connect JabRef to your LibreOffice app. There are two options: Automatic and Manual. I’ve yet to get the automatic option to work, so Manual it is. Click on the Manual icon (second from left). A window pops up, enter the path to your installation of the LibreOffice Mac. On my computer, that would be /Applications/LibreOffice.app. Click OK. View status of the connection at the bottom-left of the JabRef GUI.
  5. If all goes well, you’ll get a prompt to select the LibreOffice document. You can also select the document via the darkened folder icon (third from left). Status of the connection is reported bottom-left of the JabRef screen.
  6. Now, insert a citation by placing the cursor in text and then switching to JabRef.

Highlight the reference in JabRef, then click on the Cite button. Here’s what you should get

Report on ozone and toad innate immune function [Dohm et al. 2005]

  1. To generate the Bibliography, place cursor where you want the item to appear in the document, then switch back to JabRef.
  2. Highlight the reference, or references, then click on the refresh button (fourth from left). Here’s what a single reference will look like

References

Dohm, M. R.; Mautz, W. J.; Andrade, J. A.; Gellert, K. S.; Salas-Ferguson, L. J.; Nicolaisen, N. and Fujie, N. (2005). Effects of ozone exposure on non-specific phagocytic capacity of pulmonary macrophages from an amphibian, Bufo marinus, Environmental Toxicology and Chemistry 24 : 205-210.

Using LibreOffice Bibliography database

  1. Assuming you have incorporated your own references into the database (see this post for instructions), then select Insert → Table of contents and index → Bibliography Entry. A popup window will appear from which you select your citation from your bibliography database.
  2. Click insert and then close to return to your manuscript. If all goes well, then you will see

Type something here [Adams2008]

  1. To generate the bibliography for the paper, select Insert → Table of contents and index → Table of Contents, Index, or Bibliography. A pop up menu appears.
  2. Select “Bibliography” from the type drop down menu. At least to start, accept the defaults and press the “OK” button. If all goes well, then you’ll see

Bibliography

Adams2008: Adams, Dean C., Phylogenetic meta-analysis, 2008

[note — need to fix my database!]

Papers3 + LibreOffice

  1. As described in a previous post, just press the ctrl key twice in succession to bring up a citation manager. Instructions as before. Here’s the output

The citations in text:

A paper about social networks and scientists {Hall:2014db}

Another paper, introducing Richard Hammond’s address to a Bell Research seminar group {Erren:2007hl}

  1. And the bibliography, as described in the previous post

Erren, T. C. (2007). Ten simple rules for doing your best research, according to Hamming. PLoS Computational Biology, 3(10), 1839. http://doi.org/10.1371/journal.pcbi.0030213

Hall, N. (2014). The Kardashian index: a measure of discrepant social media profile for scientists. Genome Biology, 15(7), 424. http://doi.org/10.1186/s13059-014-0424-0

Explore and compare working with JabRef, Papers3, and Word 2008

Over the years I’ve built up a large reference list. The list has now more than 5000 entries, which reflects

  • I’m getting old: I remember when Pubmed became available to everyone over the Internet (1997) and Google Scholar appeared in (2004).
  • It’s important to my teaching style to be able to present students with access to the papers used in my lectures.
  • I like to teach my courses (biostatistics, genetics, etc.,) from a perspective of historical context.
  • My research interests have changed.

I initially managed the list with the first version of EndNote, continuing through EndNote2 through EndNote5, then a few years with the excellent and free to use Mendeley until I reached their 2GB limit on the free account. I would have continued to use Mendeley but for their choice of subscription pricing as opposed to my preference for a purchase of a onetime license. That said, I settled on JabRef and have used it for years.  As a reference database, JabRef is tops in my book, plus it’s Open Source and cross-platform software. It’s less convenient working with manuscripts when you want to cite and build a bibliography, but it certainly works. JabRef works better with LibreOffice documents than Microsoft Office documents (see this post), but it can work with Word if you use Word’s bibiliography functions and with a little run-around. Since I’ve recently become interested in Papers, thought I’d share my notes as I go.

In this posting I present brief instructions for adding citations and a bibliography to a Word 2008 document with combinations of JabRef and or Paper3.

JabRef (ver 3.8.2) & Papers3

Goal: Import selected reference list into Papers3 from JabRef

  1. From JabRef, select one or more references. File→ Save selected as… to BibTex *.bib database.
  2. Start Papers3
  3. File→ Import→ BibTex library

Note: Papers3 will import my entire JabRef database, but that’s unnecessary for the example.

Papers3 + Word 2008

Goal: cite as you type; append bibliography to manuscript

  1. Type away in Word 2008
  2. When ready to insert a citation(s), press the control key twice. This brings up a Papers3 Search window. Notably, Papers3 does not have to be running.
  3. Enter relevant search word(s) and papers will show up in the results window. Among the nice features of Papers3, you can view the article.
  4. Click on Insert Citation, and it returns back to Word 2008
And here’s what it would look like in the Word document:

Here’s a citation of a Phylogenetics book {Adams, 2008}. Here’s a citation of an article on women in statistics {Anderson, 1992}.

  1. Once you are finished adding in citations, generate the bibliography by invoking the Papers3 search again (ctrl+ctrl), then select “Format manuscript” (Appendix).
  2. Nice to know: This method works  with the free Word Online — since you would need a full install of  Word/Office in order to insert citations and generate a bibliography, use of Papers3 may be a nice alternative. The method also works with Google Docs; again, there are options to gain this type of function in Google Docs, but they are either limited or cost (e.g., free version of EasyBib Add-on has very limited function).

JabRef + Word 2008

  1. From JabRef, select multiple papers. Export selected entries and save file as Sources.xml to Documents → Microsoft User Data
  2. Start Word 2008
  3. Type away.
  4. When ready to insert citation, bring up Citations Toolbox (View→ Toolbox→ Citations)
  5. Click on the Settings icon (bottom right of popup menu); Select Citation Source Manager
  6. If all is well, you will see the sources listed in the Master List panel. Select the references required and copy them to the Current List panel. Hold down Command key to select multiple references.

And here’s what it would look like in the Word document:

Here’s a citation of a Phylogenetics book (Adams, 2008). Here’s a citation of an article on women in statistics (Anderson, 1992).

  1. Once you are finished adding in citations, generate the bibliography by placing the cursor in the document where you want the bibliography to appear (e.g., end of document), then select from the menu bar: Insert → Document Elements→ Bibliography (Appendix).

Conclusion

Much easier to use Papers3 and Word 2008 together, although the results are pretty much the same. Frankly, if this was all you are looking to do with Papers, then probably not worth the additional cost. (However, Papers can do a lot more; I particularly like how it helps you manage all kinds of documents on your computer).

Appendix

Output from Word 2008, Insert → Document Elements→ Bibliography

Bibliography

Adams, D. C. (2008). Phylogenetic meta-analysis. Evolution , 62, 567-572.

Anderson, M. (1992). The history of women and the history of statistics. Journal of Women’s History , 4.

Output from Papers3, “Format manuscript”

Adams, D. C. (2008). Phylogenetic meta-analysis. Evolution, 62, 567–572.

Anderson, M. (1992). The history of women and the history of statistics. Journal of Women’s History, 4(1).

 

LD50 & dose response calculations with R

In toxicology, the dose of a pathogen, radiation, or toxin required to kill half the members of a tested population of animals or cells is called the lethal dose, 50%, or LD50. This measure is also known as the lethal concentration, LC50, or properly after a specified test duration, the LCt50 indicating the lethal concentration and time of exposure. LD50 figures are frequently used as a general indicator of a substance’s acute toxicity. A lower LD50 is indicative of increased toxicity.

More generally, the point at which 50% response of studied organisms to range of doses of a substance (e.g., agonist, antagonist, inhibitor, etc.) to any response, from change in behavior or life history characteristics up to and including death can be described by the methods described in this chapter. The procedures outlined below assume that there is but one inflection point, i.e., an “s-shaped” curve, either up or down; if there are more than one inflection points, then the logistic equations described will not fit the data well and other choices need to be made (see Di Veroli et al 2015). We will use the drc package (Ritz et al 2015).

Example

After starting R, load the drc library.

library(drc)

Consider some hypothetical 24-hour survival data for yeast exposed to salt solutions. Let resp equal the variable for frequency of survival (e.g., estimated from OD600 readings) and NaCl equal the millimolar (mm) salt concentrations or doses.

At the R prompt type

resp=c(1,1,1,.9,.7,.3,.4,.2,0,0,0)
NaCl=seq(0,1000,100)
#To check to make sure that the sequence has been correctly created; alternatively, just enter the values.
NaCl [1] 0 100 200 300 400 500 600 700 800 900 1000
#Make a plot
plot(NaCl,resp,pch=19,cex=1.2,col=”blue”,xlab=”NaCl [mm]”,ylab=”Survival frequency”)

And here is the plot of the simulated data (Figure 1).

Figure 1. Plot of yeast survival in different amounts of salt, simulated data

Note the sigmoidal shape — we’ll need an logistic equation to describe the relationship between survival of yeast and NaCl doses.

The equation for the four parameter logistic curve, also called the Hill-Slope model, is (Figure 2)

Figure 2. Equation of the four parameter logistic curve.

where c is the parameter for the lower limit of the response, d is the parameter for the upper limit of the response, e is the relative EC50, the dose fitted half-way between the limits c and d, and b is the relative slope around the EC50. The slope, b, is also known as the Hill slope. Because this experiment included a dose of zero, a three parameter logistic curve would be appropriate. The equation simplifies to (Figure 3)

Figure 3. Equation of the three parameter logistic curve

EC50 from 4 parameter model

First, make a data frame

dose = data.frame(NaCl,resp)

Next, call up a function, drm, from the drc library and specify the model as the four parameter logistic equation, specified as LL.4(). We follow with a call to the summary command to retrieve output from the drm function. Note that we are using the four-parameter logistic equation (Figure 2)

model.dose1 = drm(dose,fct=LL.4())
summary(model.dose1)

And here is the output from R.

Model fitted: Log-logistic (ED50 as parameter) (4 parms)

Parameter estimates:

                   Estimate  Std. Error    t-value  p-value
b:(Intercept)      3.753415    1.074050   3.494636   0.0101
c:(Intercept)     -0.084487    0.127962  -0.660251   0.5302
d:(Intercept)      1.017592    0.052460  19.397441   0.0000
e:(Intercept)    492.645128   47.679765  10.332373   0.0000

Residual standard error:

0.0845254 (7 degrees of freedom)

The EC50, technically, because the data were for survival, the LD50 is e = 492.65 mM NaCl, where e, again, is the dose fitted half-way between the limits c and d.

You should always plot the predicted line from your model against the real data and inspect the fit.

At the R prompt type

lines(dose,predict(model.dose1, data.frame(x=dose)),col=”red”)

As long as the plot you made in earlier steps is still available, R will add the line specified in the lines command. Here is the plot with the predicted logistic line displayed (Figure 4).

Figure 4. Plot with the predicted logistic line displayed

While there are additional steps we can take to decide is the fit of the logistic curve was good to the data, visual inspection suggests that indeed the curve fits the data reasonably well.

More work to do

Because the EC50 calculations are an estimate, we should also obtain confidence intervals. The drc library provides a function called ED which will accomplish this. We can also ask what the survival was at 10% and 90% in addition to 50%, along with the confidence intervals for each.

At the R prompt type

ED(model.dose1,c(10,50,90), interval=”delta”)

And the output is shown below.

Estimated effective doses
(Delta method-based confidence interval(s))

Estimate Std. Error   Lower   Upper
1:10  274.348     38.291 183.803  364.89
1:50  492.645     47.680 379.900  605.39
1:90  884.642    208.171 392.395 1376.89

Thus, the 95% confidence interval for the EC50 calculated from the four parameter logistic curve was between the lower limit of 379.9 and an upper limit of 605.39 mm NaCl.

EC50 from 3 parameter model

Looking at the summary output from the four parameter logistic function we see that the value for c was -0.085 and p-value was 0.53, which suggests that the lower limit was not statistically different from zero. We would expect this given that the experiment had included a control of zero mm added salt. Thus, we can explore by how much the EC50 estimate changes when the additional parameter c is no longer estimated by calculating a three parameter model with LL.3(). At the R prompt type

model.dose2 = drm(dose,fct=LL.3())
summary(model.dose2)

And here is the output.

Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)

Parameter estimates:

Estimate Std. Error   t-value p-value
b:(Intercept)   4.46194    0.76880   5.80378   4e-04
d:(Intercept)   1.00982    0.04866  20.75272   0e+00
e:(Intercept) 467.87842   25.24633  18.53253   0e+00

Residual standard error:

0.08267671 (8 degrees of freedom)

The EC50 is the value of e: 467.88 mM NaCl.

How do the four and three parameter models compare? We can rephrase this as as statistical test of fit; which model fits the data better, a three parameter or a four parameter model?

At the R prompt type

anova(model.dose1, model.dose2)

The output is below

1st model
fct:      LL.3()
2nd model
fct:      LL.4()

ANOVA table

          ModelDf      RSS Df F value p value
2nd model       8 0.054684
1st model       7 0.050012  1  0.6539  0.4453

Because the p-value is much greater than 5% we may conclude that the fit of the four parameter model was not significantly better than the fit of the three parameter model. Thus, based on your criteria of model fit (e.g., select a more complicated model if it demonstrates an improvement over a model with fewer predictors), we would conclude that the three parameter model is the preferred model.

The plot below (Figure 5) now includes the fit of the four parameter model (red line) and the three parameter model (green line) to the data.

Figure 5. Plot now includes the fit of the four parameter model (red line) and the three parameter model (green line) to the data.

The R command to make this addition to our active plot was

lines(dose,predict(model.dose2, data.frame(x=dose)),col=”green”)

We continue with our analysis of the three parameter model and produce the confidence intervals for the EC50 (modify the ED() statement above for model.dose2 in place of model.dose1).

Estimated effective doses
(Delta method-based confidence interval(s))

     Estimate Std. Error   Lower  Upper
1:10  285.937     33.154 209.483 362.39
1:50  467.878     25.246 409.660 526.10
1:90  765.589     63.026 620.251 910.93

Thus, the 95% confidence interval for the EC50 calculated from the three parameter logistic curve was between the lower limit of 409.7 and an upper limit of 526.1 mm NaCl. The difference between upper and lower limits was 116.4 mm NaCl, a smaller difference than the interval calculated for the 95% confidence intervals from the four parameter model (225.5 mm NaCl). This demonstrates the estimation trade-off: more parameters to estimate reduces the confidence in any one parameter estimate.

Additional notes of EC50 calculations

Care must be taken that the model fits the data well. What if we did not have observations throughout the range of the sigmoidal shape? We can explore this by taking a subset of the data. At the R prompt type

dd = dose[1:6,]

Here, all values greater than dose 500 were dropped (see below for a more general approach to subset).

> dd
resp dose
1  1.0    0
2  1.0  100
3  1.0  200
4  0.9  300
5  0.7  400
6  0.3  500

and the plot does not show an obvious sigmoidal shape (Figure 6)

Figure 6. Plot of subset of data. No longer “sigmoidal” curve.

We run the three parameter model again, this time on the subset of the data.

model.dosedd = drm(dd,fct=LL.3())
summary(model.dosedd)

Output from the results are

Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)

Parameter estimates:

                Estimate Std. Error    t-value p-value
b:(Intercept)   6.989842   0.760112   9.195801  0.0027
d:(Intercept)   0.993391   0.014793  67.153883  0.0000
e:(Intercept) 446.882542   5.905728  75.669344  0.0000

Residual standard error:

0.02574154 (3 degrees of freedom)

Conclusion? The estimate is different, but only just so, 447 vs. 468 mm NaCl. Thus, within reason, the drc function performs well for the calculation of EC50. Not all tools available to the student will do as well.

Use the subset function instead:

dd = subset(dose, NaCl <= 500)

References

Di Veroli G. Y., Fornari C., Goldlust I., Mills G., Koh S. B., Bramhall J. L., Richards, F. M., Jodrell D. I. (2015) An automated fitting procedure and software for dose-response curves with multiphasic features. Scientific Reports 5: 14701. (doi: 10.1038/srep14701)

Ritz, C., Baty, F., Streibig, J. C., Gerhard, D. (2015) Dose-Response Analysis Using R. PLOS ONE, 10(12),
e0146021 (doi: 10.1371/journal.pone.0146021

Install R packages

If you know the name of the package, then the easiest way is to use the R console. Just enter the name of the package and set the mirror site. I typically use https://cran.cnr.berkeley.edu.

At the R prompt, type

install.packages(“R package”, repos=”https://cran.cnr.berkeley.edu”)

where “R package” is a place holder for any of the packages you wish to install. For example, to download the package Rcmdr, enter

install.packages(“Rcmdr”, repos=”https://cran.cnr.berkeley.edu”)

Assuming all goes well you will see some lines appear in red type beginning with the phrase “trying url…” and, if successful, a message in black that begins “The downloaded binary packages are in …”

Note that the package has been installed, but the functions and other features in the package are not available to you until you load the package into your R session. In general, this is accomplished by simply typing at the R prompt

library(R package)

After pressing the <enter> key, the package will be available for your use.

How to install R on Windows 10 PC

  1. Go to https://cran.r-project.org/
  2. Select Download R for Windows
  3. Select “base” and then click on “Download R 3.4.0 for Windows” to get the latest version. As of June 2017 that would be

R-3.4.0

  1. Download the file; once completed, click on the file to begin installation. Accept defaults.
  2. After R has been installed, start the application to work with the R statistical software.

Next: How to install Rcmdr and other R packages

How to install R on your Mac

You must install XQuartz, an X windowing system, before you install R.

Install XQuartz on your Mac

  1. Got to https://www.xquartz.org/
  2. Select the latest version. As of June 2017 that would be

XQuartz-2.7.11.dmg

  1. Download the file; once completed, click on the file to begin installation. Accept defaults.
  2. If your computer replies with a warning message about installing from unknown sources, you’ve run into “Gatekeeper.” Click here for help with Gatekeeper.
  3. After XQuartz has been installed, it is good practice to restart your computer before proceeding.
  4. Now you can install R.

Note: After updating the operating system, it is recommended that you reinstall XQuartz.

Here’s what Apple has to say about why you need to install XQuartz.

Install R on your Mac

  1. Go to https://cran.r-project.org/
  2. Select Download R for (Mac) OS X
  3. Select the latest version. As of June 2017 that would be

R-3.4.0.pkg

  1. Download the file; once completed, click on the file to begin installation. Accept defaults.
  2. If your computer replies with a warning message about installing from unknown sources, you’ve run into “Gatekeeper.” Click here for help with Gatekeeper.
  3. After R has been installed, start the application to work with the R statistical software.

Next: How to install Rcmdr and other R packages

 

R notes: How to get genetic distances from a tree

This note is about extracting the patristic distances from a Newick tree for a set of OTUs. Note: a patristic distance is basically the sum of the branch lengths linking two nodes in a tree.

A number of fantastic packages in R are available to work with phylogeny and sequences. Similarly, a number of folks have been kind enough to share their notes on use of R and phylogenetics — this note derives from their work and is presented here to assist me with teaching students. In no particular order, references used are:

We will use the “ape” package along with a general package called “spaa” which helps manipulate output. The result is a CSV text file with three columns (number, pairs of OTU, distances), columns separated with commas.

In order to use R it must be installed on your computer.  Click here for instructions to  install R on your Mac computer or here for installation on a Windows 10 computer.

About the example data

In addition to a working version of R on your computer, you need to have saved your tree in Newick format (and recall where the file is on your computer :-). The example tree

(Mouse:0.0604463,((Alligator:0.0407394,Chicken:0.038893):0.0554883,Xenopus:0.216882):0.100985,Human:0.0320104);

accession numbers: NP_001521,  then blastp retrieved NP_001300848, NP_989628, XP_019349624, NP_001080449

aligned sequences by ClustalW (default settings), tree built on distances (Jones-Taylor-Thornton) and Phylip neighbor joining method within Unipro UGENE workbench.

The R script

Start your R application software.

If you have not downloaded and installed ape and spaa, do so now. Click here for instructions for Macs and here for Windows.

Here’s the script in R (the “#” indicates comments and are not interpreted by R — I’ve added blue color to comments). Don’t type the “>”, that’s the R prompt. Type everything after the “>” exactly as written (yes, you can change the object names).

#Get patristic distances. First, load the ape library

> library(ape)
#Load your phylogenetic tree, Newick format. This example is based on Clustal Omega-aligned HIF1A sequences obtained by blastp. Note that you would need to change the text pointing to the folder location
# this command finds the working directory
> getwd()
#use this command to change to your BI308L folder — note, this is just an example, yours will be different!
> setwd(“/my BI308L folder/Trees”)
#because I set the working directory with setwd, I have access to all files in that folder. Here, I load my newick file
> mytree = read.tree(“HIF1A.nwk”)
#Check that the tree file loaded correctly by plotting it (see below for the image)
> plot(mytree, type=”phylogram”, edge.width = 2)
#Add the pairwise distances; A patristic distance is the sum of the lengths of the branches that link two nodes in a tree
> PatristicDistMatrix = cophenetic.phylo(mytree)
#Display the pairwise distances from the tree. A square matrix results. Print the distance matrix.
> PatristicDistMatrix

              Mouse Alligator   Chicken   Xenopus     Human
Mouse     0.0000000 0.2576590 0.2558126 0.3783133 0.0924567
Alligator 0.2576590 0.0000000 0.0796324 0.3131097 0.2292231
Chicken   0.2558126 0.0796324 0.0000000 0.3112633 0.2273767
Xenopus   0.3783133 0.3131097 0.3112633 0.0000000 0.3498774
Human     0.0924567 0.2292231 0.2273767 0.3498774 0.0000000

Now, I could get impatient and then grab (copy/paste) the distances from the matrix and place into my Excel file. I’d then have to edit the file to get the distances into the correct pair-wise format. A messy step, not recommended.

Continue to read for better solution

Install and load the spaa library

library(spaa)
>disMatrix <-as.dist(PatristicDistMatrix) #tell R that we are working with a distance object
>outfile <- dist2list(disMatrix)
>outfile #if all go’s well, you will see three columns with 25 rows of data like below
col         row     value

1 Mouse     Mouse 0.0000000
2 Alligator Mouse 0.2576590

25 Human    Human 0.0000000
>write.csv(outfile,”outfile.csv”, col.names=NA) # this command will write a text only file called outfile.csv to your working directory. You can then import it to Excel or other spreadsheet application. The columns are  separated by commas (hence the ?csv)

Here’s the plot of “mytree”, unrooted, from R

NJ gene tree (HIF1A), unrooted

My computer is running really slow after loading a new application

I’ve found that many students are hesitant to load software, with fears based on lack of knowledge about how to install software right up to the very sensible concern about the safety and integrity of software downloaded from websites they have never heard about. I’ll post at a later time on how I instruct students about checksum and other aspects of verifying software. A quick Google search finds all kind of advice on such things. Instead, in this post I wanted to address another interesting aspect of student’s knowledge about their own computers — how to manage the software bloat that comes with new computers and the pre-installed applications.

After installing one or more of these recommended applications I often get complaints from students about how slow their computers have become. Naturally they connect the two — my software slowed their computer. A reasonable conclusion, but not true. While some statistical or bioinformatics routines will tax your personal computer, most of what we will run do not — we run statistics on projects with sample size in the 100s and variables in the range of dozens. Even when we run nonlinear estimation routines or matrix manipulations, these procedures are completed in seconds. Similarly, while sequence alignment and other manipulations potentially can tax a computer, the kinds of work we do in these classes rarely will hang a computer for more than a minute or two.

With this as a backdrop, here’s the advise and help I give students.

Assuming you downloaded the software from the appropriate source and checked it against your anti-virus software, the problem of a slow computer is probably not due to the software you just installed. Poor computer performance is more likely because of the number of processes running on your computer.

On Macs you can check for active processes with Activity Monitor (Applications → Utilities folder); on Windows machines use the Task Manager and select Services. Activity Monitor provides an extensive look at your computer, Task Manager less so, but both can be used to stop processes and thus free up system CPU and memory — and make your computer run faster! Some caution here — do a little Google work to look up process names and confirm that you can indeed stop the process without harming your computer.