Title: | Metagenomics Longitudinal Differential Abundance Method |
---|---|
Description: | Identify time intervals of differentially abundant metagenomic features in longitudinal studies (Metwally AA, et al., Microbiome, 2018 <doi:10.1186/s40168-018-0402-y>). |
Authors: | Ahmed A. Metwally, Yang Dai, Patricia W. Finn, David L. Perkins |
Maintainer: | Ahmed A. Metwally <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.8 |
Built: | 2024-11-13 03:59:59 UTC |
Source: | https://github.com/aametwally/metalonda |
Fits longitudinal samples from the same group using negative binomial or LOWESS for all permutations
areaPermutation(perm)
areaPermutation(perm)
perm |
list has all the permutated models |
returns a list of all permutation area ratio
Ahmed Metwally ([email protected])
data(metalonda_test_data) n.sample = 5 # sample size; n.timepoints = 10 # time point; n.perm = 3 n.group= 2 # number of group; Group = factor(c(rep(0,n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggretage.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) perm = permutation(aggretage.df, n.perm = 3, method = "nbinomial", points) areaPermutation(perm)
data(metalonda_test_data) n.sample = 5 # sample size; n.timepoints = 10 # time point; n.perm = 3 n.group= 2 # number of group; Group = factor(c(rep(0,n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggretage.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) perm = permutation(aggretage.df, n.perm = 3, method = "nbinomial", points) areaPermutation(perm)
Fits longitudinal samples from the same group using negative binomial smoothing splines or LOWESS
curveFitting(df, method = "nbinomial", points)
curveFitting(df, method = "nbinomial", points)
df |
dataframe has the Count, Group, ID, Time |
method |
fitting method (nbinomial, lowess) |
points |
points at which the prediction should happen |
returns the fitted model
Ahmed Metwally ([email protected])
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggretage.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) cf = curveFitting(df = aggretage.df, method= "nbinomial", points)
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggretage.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) cf = curveFitting(df = aggretage.df, method= "nbinomial", points)
Identify significant time intervals
findSigInterval(adjusted.pvalue, threshold = 0.05, sign)
findSigInterval(adjusted.pvalue, threshold = 0.05, sign)
adjusted.pvalue |
vector of the adjusted p-value |
threshold |
p-value cut off |
sign |
vector hold area sign of each time interval |
returns a list of the start and end points of all significant time intervals
Ahmed Metwally ([email protected])
p = c(0.04, 0.01, 0.02, 0.04, 0.06, 0.2, 0.06, 0.04) sign = c(1, 1, 1, 1, -1, -1, 1, 1) findSigInterval(p, threshold = 0.05, sign)
p = c(0.04, 0.01, 0.02, 0.04, 0.06, 0.2, 0.06, 0.04) sign = c(1, 1, 1, 1, -1, -1, 1, 1) findSigInterval(p, threshold = 0.05, sign)
Calculate Area Ratio (AR) of each feature's time interval
intervalArea(curve.fit.df)
intervalArea(curve.fit.df)
curve.fit.df |
gss data object of the fitted spline |
returns the area ratio for all time intervals
Ahmed Metwally ([email protected])
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group= 2 Group = factor(c(rep(0,n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) model = curveFitting(df = aggregate.df, method= "nbinomial", points) intervalArea(model)
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group= 2 Group = factor(c(rep(0,n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) model = curveFitting(df = aggregate.df, method= "nbinomial", points) intervalArea(model)
Find significant time intervals of the one feature
metalonda( Count, Time, Group, ID, n.perm = 500, fit.method = "nbinomial", points, text = 0, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
metalonda( Count, Time, Group, ID, n.perm = 500, fit.method = "nbinomial", points, text = 0, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
Count |
matrix has the number of reads that mapped to each feature in each sample. |
Time |
vector of the time label of each sample. |
Group |
vector of the group label of each sample. |
ID |
vector of the subject ID label of each sample. |
n.perm |
number of permutations. |
fit.method |
fitting method (nbinomial, lowess). |
points |
points at which the prediction should happen. |
text |
Feature's name. |
parall |
boolean to indicate whether to use multicore. |
pvalue.threshold |
p-value threshold cutoff for identifing significant time intervals. |
adjust.method |
multiple testing correction method. |
time.unit |
time unit used in the Time vector (hours, days, weeks, months, etc.) |
ylabel |
text to be shown on the y-axis of all generated figures (default: "Normalized Count") |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
prefix |
prefix to be used to create directory for the analysis results |
returns a list of the significant time intervals for the tested feature.
Ahmed Metwally ([email protected])
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) ## Not run: output.nbinomial = metalonda(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID, fit.method = "nbinomial", n.perm = 10, points = points, text = rownames(metalonda_test_data)[1], parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "hours", ylabel = "Normalized Count", col = c("black", "green")) ## End(Not run)
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) ## Not run: output.nbinomial = metalonda(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID, fit.method = "nbinomial", n.perm = 10, points = points, text = rownames(metalonda_test_data)[1], parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "hours", ylabel = "Normalized Count", col = c("black", "green")) ## End(Not run)
The dataset is used for testing the MetaLonDA
metalonda_test_data
metalonda_test_data
A data frame with 2 OTUs patterns
Identify significant features and their significant time interval
metalondaAll( Count, Time, Group, ID, n.perm = 500, fit.method = "nbinomial", num.intervals = 100, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "days", norm.method = "none", prefix = "Output", ylabel = "Normalized Count", col = c("blue", "firebrick") )
metalondaAll( Count, Time, Group, ID, n.perm = 500, fit.method = "nbinomial", num.intervals = 100, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "days", norm.method = "none", prefix = "Output", ylabel = "Normalized Count", col = c("blue", "firebrick") )
Count |
Count matrix of all features |
Time |
Time label of all samples |
Group |
Group label of all samples |
ID |
individual ID label for samples |
n.perm |
number of permutations |
fit.method |
The fitting method (nbinomial, lowess) |
num.intervals |
The number of time intervals at which metalonda test differential abundance |
parall |
logic to indicate whether to use multicore |
pvalue.threshold |
p-value threshold cutoff |
adjust.method |
Multiple testing correction methods |
time.unit |
time unit used in the Time vector (hours, days, weeks, months, etc.) |
norm.method |
normalization method to be used to normalize count matrix (css, tmm, ra, log10, median_ratio) |
prefix |
prefix for the output figure |
ylabel |
text to be shown on the y-axis of all generated figures (default: "Normalized Count") |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
Returns a list of the significant features a long with their significant time intervals
Ahmed Metwally ([email protected])
## Not run: data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) output.nbinomial = metalondaAll(Count = metalonda_test_data, Time = Time, Group = Group, ID = ID, n.perm = 10, fit.method = "nbinomial", num.intervals = 100, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "hours", norm.method = "none", prefix = "Test", time.unit = "hours", ylabel = "Normalized Count", col = c("black", "green")) ## End(Not run)
## Not run: data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1,n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) output.nbinomial = metalondaAll(Count = metalonda_test_data, Time = Time, Group = Group, ID = ID, n.perm = 10, fit.method = "nbinomial", num.intervals = 100, parall = FALSE, pvalue.threshold = 0.05, adjust.method = "BH", time.unit = "hours", norm.method = "none", prefix = "Test", time.unit = "hours", ylabel = "Normalized Count", col = c("black", "green")) ## End(Not run)
Normalize count matrix
normalize(count, method = "css")
normalize(count, method = "css")
count |
count matrix |
method |
normalization method |
Ahmed Metwally ([email protected])
Permutes the group label of the samples in order to construct the AR empirical distibution
permutation( perm.dat, n.perm = 500, method = "nbinomial", points, lev, parall = FALSE )
permutation( perm.dat, n.perm = 500, method = "nbinomial", points, lev, parall = FALSE )
perm.dat |
dataframe has the Count, Group, ID, Time |
n.perm |
number of permutations |
method |
The fitting method (negative binomial, LOWESS) |
points |
The points at which the prediction should happen |
lev |
the two level's name |
parall |
boolean to indicate whether to use multicore. |
returns the fitted model for all the permutations
Ahmed Metwally ([email protected])
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.perm = 3 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) prm = permutation(aggregate.df, n.perm = 3, method = "nbinomial", points)
data(metalonda_test_data) n.sample = 5 n.timepoints = 10 n.perm = 3 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) prm = permutation(aggregate.df, n.perm = 3, method = "nbinomial", points)
Visualize significant time interval
visualizeArea( aggregate.df, model.ss, method, start, end, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
visualizeArea( aggregate.df, model.ss, method, start, end, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
aggregate.df |
Dataframe has the Count, Group, ID, Time |
model.ss |
The fitted model |
method |
Fitting method (negative binomial or LOWESS) |
start |
Vector of the start points of the time intervals |
end |
Vector of the end points of the time intervals |
text |
Feature name |
group.levels |
Level's name |
unit |
time unit used in the Time vector (hours, days, weeks, months, etc.) |
ylabel |
text to be shown on the y-axis of all generated figures (default: "Normalized Count") |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
prefix |
prefix to be used to create directory for the analysis results |
Ahmed Metwally ([email protected])
Visualize Area Ratio (AR) empirical distribution for each time interval
visualizeARHistogram(permuted, text, method, prefix = "Test")
visualizeARHistogram(permuted, text, method, prefix = "Test")
permuted |
Permutation of the permuted data |
text |
Feature name |
method |
fitting method |
prefix |
prefix to be used to create directory for the analysis results |
Ahmed Metwally ([email protected])
Visualize Longitudinal Feature
visualizeFeature( df, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
visualizeFeature( df, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
df |
dataframe has the Count, Group, ID, Time |
text |
feature name |
group.levels |
The two level's name |
unit |
time interval unit |
ylabel |
text to be shown on the y-axis of all generated figures (default: "Normalized Count") |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
prefix |
prefix to be used to create directory for the analysis results |
Ahmed Metwally ([email protected])
data(metalonda_test_data) pfx = tempfile() dir.create(file.path(pfx)) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) visualizeFeature(df = aggregate.df, text = rownames(metalonda_test_data)[1], group.levels = Group, prefix = pfx)
data(metalonda_test_data) pfx = tempfile() dir.create(file.path(pfx)) n.sample = 5 n.timepoints = 10 n.group = 2 Group = factor(c(rep(0, n.sample*n.timepoints), rep(1, n.sample*n.timepoints))) Time = rep(rep(1:n.timepoints, times = n.sample), 2) ID = factor(rep(1:(2*n.sample), each = n.timepoints)) points = seq(1, 10, length.out = 10) aggregate.df = data.frame(Count = metalonda_test_data[1,], Time = Time, Group = Group, ID = ID) visualizeFeature(df = aggregate.df, text = rownames(metalonda_test_data)[1], group.levels = Group, prefix = pfx)
Plot the longitudinal features along with the fitted splines
visualizeFeatureSpline( df, model, method, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
visualizeFeatureSpline( df, model, method, text, group.levels, unit = "days", ylabel = "Normalized Count", col = c("blue", "firebrick"), prefix = "Test" )
df |
dataframe has the Count , Group, ID, Time |
model |
the fitted model |
method |
The fitting method (negative binomial, LOWESS) |
text |
feature name |
group.levels |
The two level's name |
unit |
time unit used in the Time vector (hours, days, weeks, months, etc.) |
ylabel |
text to be shown on the y-axis of all generated figures (default: "Normalized Count") |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
prefix |
prefix to be used to create directory for the analysis results |
Ahmed Metwally ([email protected])
Visualize all significant time intervals for all tested features
visualizeTimeIntervals( interval.details, prefix = "Test", unit = "days", col = c("blue", "firebrick") )
visualizeTimeIntervals( interval.details, prefix = "Test", unit = "days", col = c("blue", "firebrick") )
interval.details |
Dataframe has infomation about significant interval (feature name, start, end, dominant, p-value) |
prefix |
prefix for the output figure |
unit |
time unit used in the Time vector (hours, days, weeks, months, etc.) |
col |
two color to be used for the two groups (eg., c("red", "blue")). |
Ahmed Metwally ([email protected])
Visualize log2 fold-change and significance of each interval as volcano plot
visualizeVolcanoPlot(df, text, prefix = "Test")
visualizeVolcanoPlot(df, text, prefix = "Test")
df |
Dataframe has a detailed summary about feature's significant intervals |
text |
Feature name |
prefix |
prefix to be used to create directory for the analysis results |
Ahmed Metwally ([email protected])