Title: | Whole Genome Phylogenies Using Sequence Environments |
---|---|
Description: | Contains utilities for the analysis of protein sequences in a phylogenetic context. Allows the generation of phylogenetic trees base on protein sequences in an alignment-independent way. Two different methods have been implemented. One approach is based on the frequency analysis of n-grams, previously described in Stuart et al. (2002) <doi:10.1093/bioinformatics/18.1.100>. The other approach is based on the species-specific neighborhood preference around amino acids. Features include the conversion of a protein set into a vector reflecting these neighborhood preferences, pairwise distances (dissimilarity) between these vectors, and the generation of trees based on these distance matrices. |
Authors: | Juan Carlos Aledo [aut, cre] |
Maintainer: | Juan Carlos Aledo <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.3 |
Built: | 2024-11-05 05:28:17 UTC |
Source: | https://github.com/cran/EnvNJ |
Returns the residue found at the requested position.
aa.at(at, target)
aa.at(at, target)
at |
the position in the primary structure of the protein. |
target |
a character string specifying the sequence of the protein of interest. |
Returns a single character representing the residue found at the indicated position in the indicated protein.
aa.comp()
aa.at(2, 'MFSQQQRCVE')
aa.at(2, 'MFSQQQRCVE')
Returns a table with the amino acid composition of the target protein.
aa.comp(target, uniprot = TRUE)
aa.comp(target, uniprot = TRUE)
target |
a character string specifying the UniProt ID of the protein of interest or, alternatively, the sequence of that protein. |
uniprot |
logical, if TRUE the argument 'target' should be an ID. |
Returns a dataframe with the absolute frequency of each type of residue found in the target peptide.
aa.comp('MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQK', uniprot = FALSE)
aa.comp('MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQK', uniprot = FALSE)
Computes the frequency of each amino acid in each species.
aaf(data)
aaf(data)
data |
input data must be a dataframe (see details). |
Input data must be a dataframe where each row corresponds to an individual protein, and each column identifies a species. Therefore, the columns' names of this dataframe must be coherent with the names of the OTUs being analyzed.
A dataframe providing amino acid frequencies en the set of species. Rows correspond amino acids and columns to species.
env.sp(), otu.vector(), otu.space()
aaf(bovids)
aaf(bovids)
A dataset containing the sequences of mtDNA-encoded proteins from different bovids
bovids
bovids
A data frame with 13 rows (one per protein) and 11 variables (one per species)
https://www.ncbi.nlm.nih.gov/genome/organelle/
Converts cosines values into dissimilarities values.
cos2dis(cos)
cos2dis(cos)
cos |
a square upper triangular matrix where cos(i,j) is the cosine between the vector i and j. |
Cosines are standard measure of vector similarity, and can be converted into distance by dij = -log( (1 + cos(i,j) )/2).
A triangular matrix with the distances.
vcos()
data(bovids) vectors = otu.space(bovids[, 7:11]) cosData = vcos(vectors) disData = cos2dis(cosData)
data(bovids) vectors = otu.space(bovids[, 7:11]) cosData = vcos(vectors) disData = cos2dis(cosData)
Converts a phylip distance matrix into a either an R dataFrame or matrix.
d.phy2df(phyfile, as = 'matrix')
d.phy2df(phyfile, as = 'matrix')
phyfile |
path to the file containing the distances in phylip format. |
as |
class of the output R data. It should be either 'dataframe' or 'matrix'. |
Either a dataframe or a matrix containing the distances read from the phy file.
d.df2pny()
## Not run: d.phy2df(phyfile = "./data_t/d_dummy.txt")
## Not run: d.phy2df(phyfile = "./data_t/d_dummy.txt")
Converts a dataframe into a fasta file.
df2fasta(df, out.file)
df2fasta(df, out.file)
df |
a named (both rows and cols) dataframe (see details). |
out.file |
path and name of output file. |
The format of the df should be as follows. Each row represents a protein sequence and each column a species.
A fasta file that is saved in the specified path.
fastaconc()
## Not run: df2fasta(df = bovids, out.file = "./example.fasta")
## Not run: df2fasta(df = bovids, out.file = "./example.fasta")
Extracts the sequence environment around a given position.
env.extract(seq, c, r)
env.extract(seq, c, r)
seq |
a string protein sequence. |
c |
center of the environment. |
r |
radius of the environment. |
Returns a a strings with the extracted environment sequence.
Aledo et al. Sci Rep. 2015; 5: 16955. (PMID: 26597773)
env.matrices(), env.sp()
env.extract('ARGGQQQCATSYV', c = 8, r = 2)
env.extract('ARGGQQQCATSYV', c = 8, r = 2)
Builds trees based on the environment around the indicated amino acid(s).
env.fasta(file, r = 10, aa = 'all', out.file = 'any')
env.fasta(file, r = 10, aa = 'all', out.file = 'any')
file |
path to the single multispecies fasta file to be used as input. |
r |
a positive integer indicating the radius of the sequence segment considered as environment. |
aa |
the amino acid(s) to be used to encoded the species. |
out.file |
path and name of output file. Only if intermediate results data want to be saved (see details). |
This function builds alignment-independent phylogenetic trees. The input data is a fasta file. When an out.file path is provided, the environment sequences of each species and the vector representing each species are saved in the path provided.
A list with two objects, the first one is an inter-species distance matrix. The second one is an object of class 'phylo'.
envnj(), fastaconc()
## Not run: env.fasta(file = "./data_t/sample5.fasta")
## Not run: env.fasta(file = "./data_t/sample5.fasta")
Provides the frequencies of each amino acid within the environment.
env.matrices(env)
env.matrices(env)
env |
a character string vector containing the environments. |
Returns a list of two dataframes. The first, shown the environment in matrix form. The second provides the frequencies of each amino acid within the environments.
Aledo et al. Sci Rep. 2015; 5: 16955. (PMID: 26597773)
env.extract(), otu.vector()
env.matrices(c('ANQRmCTPQ', 'LYPPmQTPC', 'XXGSmSGXX'))
env.matrices(c('ANQRmCTPQ', 'LYPPmQTPC', 'XXGSmSGXX'))
Extracts the sequence environments around the selected amino acid(s) in the chosen species.
env.sp(data, sp, r = 10, aa = 'all', silent = TRUE)
env.sp(data, sp, r = 10, aa = 'all', silent = TRUE)
data |
input data must be a dataframe (see details). |
sp |
the species of interest (it should be named as in the input dataframe). |
r |
a positive integer indicating the radius of the sequence segment considered as environment. |
aa |
the amino acid(s) which environments are going to be extracted. |
silent |
logical. When FALSE the program progress is reported to alleviate loneliness. |
Input data must be a dataframe where each row corresponds to an individual protein. The columns contain the sequence of the protein corresponding to the row in each species. Therefore, the columns' names of this dataframe must be coherent with the names of the OTUs being analyzed.
A list of environment sequences. Each element from the list is a vector with the environment sequences around an amino acid. So, the length list is the same as the length of aa.
otu.vector(), otu.space()
data(bovids) env.sp(data = bovids, sp = "Bos_taurus", r = 2)
data(bovids) env.sp(data = bovids, sp = "Bos_taurus", r = 2)
Converts fasta files into environment vectors
envfascpp(path, r = 10, exefile, outfile)
envfascpp(path, r = 10, exefile, outfile)
path |
path to the folder that contain a the file 'list.txt', which contains the names of all the fasta files to be analyzed (one per line). The fasta files must be in the same path. |
r |
a positive integer indicating the radius of the sequence segment considered as environment. |
exefile |
path to the directory containing the envfas executable. |
outfile |
path to, and name of, output file. |
This function builds and write vectors representing the species' proteome. To use this function, you must first download the source code envfas.cpp (https://bitbucket.org/jcaledo/envnj/src/master/AncillaryCode/envfas.cpp) and compile it.
A txt file per fasta file analyzed. Each txt file can be read as a vector. Thus the number of lines gives the dimension of the vector.
envnj(), fastaconc()
## Not run: envfastacpp("./data_t/list.txt", 10, ".", "./MyResults")
## Not run: envfastacpp("./data_t/list.txt", 10, ".", "./MyResults")
Builds trees based on the environment around the indicated amino acid(s).
envnj(data, r = 10, aa = 'all', metric = "cosine", clustering = "nj", outgroup = 'any')
envnj(data, r = 10, aa = 'all', metric = "cosine", clustering = "nj", outgroup = 'any')
data |
input data must be a dataframe where each row corresponds to a protein sequence and each column to a species. |
r |
a positive integer indicating the radius of the sequence segment considered as environment. |
aa |
the amino acid(s) to be used to encoded the species. |
metric |
character string indicating the metric (see metrics() to see the methods allowed). |
clustering |
string indicating the clustering method, either "nj" or "upgma". |
outgroup |
when a rooted tree is desired, it indicates the species to be used as outgroup. |
This function builds alignment-independent phylogenetic trees.
A list with two objects, the first one is an inter-species distance matrix. The second one is an object of class 'phylo'.
otu.space(), metrics()
data(bovids) envnj(bovids[, 7:11], aa = "all", outgroup = "Pseudoryx_nghetinhensis")
data(bovids) envnj(bovids[, 7:11], aa = "all", outgroup = "Pseudoryx_nghetinhensis")
Concatenate fasta files from different species in a single multispecies fasta file.
fastaconc(otus, inputdir = ".", out.file = "./concatenated_multispecies.fasta")
fastaconc(otus, inputdir = ".", out.file = "./concatenated_multispecies.fasta")
otus |
a character vector giving the otus' names. |
inputdir |
path to the directory containing the individual fasta files. |
out.file |
path and name of output file. |
When we have fasta files (extension should be '.fasta'), each one for a species containing different sequences of the given species, this function concatenate the different sequences of the same species and writes it as a single sequence in a single multispecies fasta file. If the individual fasta files are found in the working directory, the inputdir argument don't need to be passed. The names of the individual fasta files must match the otus' names.
A single multispecies fasta file with the sequences of each species spliced in a single sequence.
df2fasta()
## Not run: fastaconc(otus = c('Glis_glis', 'Ovis_aries', 'Sus_scrofa'))
## Not run: fastaconc(otus = c('Glis_glis', 'Ovis_aries', 'Sus_scrofa'))
Computes the dissimilarity between n-dimensional vectors.
metrics(vset, method = 'euclidean', p = 2)
metrics(vset, method = 'euclidean', p = 2)
vset |
matrix (n x m) where each column is a n-dimensional vector. |
method |
a character string indicating the distance/dissimilarity method to be used (see details). |
p |
power of the Minkowski distance. This parameter is only relevant if the method 'minkowski' has been selected. |
Although many of the offered methods compute a proper distance, that is not always the case. For instance, for a non null vector, v, the 'cosine' method gives d(v, 2v) = 0, violating the coincidence axiom. For that reason we prefer to use the term dissimilarity instead of distance. The methods offered can be grouped into families.
('euclidean', 'manhattan', 'minkowski', 'chebyshev')
Euclidean = sqrt( sum | P_i - Q_i |^2)
Manhattan = sum | P_i - Q_i |
Minkowski = ( sum| P_i - Q_i |^p)^1/p
Chebyshev = max | P_i - Q_i |
('sorensen', 'soergel', 'lorentzian', 'kulczynski', 'canberra')
Sorensen = sum | P_i - Q_i | / sum (P_i + Q_i)
Soergel = sum | P_i - Q_i | / sum max(P_i , Q_i)
Lorentzian = sum ln(1 + | P_i - Q_i |)
Kulczynski = sum | P_i - Q_i | / sum min(P_i , Q_i)
Canberra = sum | P_i - Q_i | / (P_i + Q_i)
('non-intersection', 'wavehedges', 'czekanowski', 'motyka')
Non-intersection = 1 - sum min(P_i , Q_i)
Wave-Hedges = sum | P_i - Q_i | / max(P_i , Q_i)
Czekanowski = sum | P_i - Q_i | / sum | P_i + Q_i |
Motyka = sum max(P_i , Q_i) / sum (P_i , Q_i)
('cosine', 'jaccard')
Cosine = - ln(0.5 (1 + (P_i Q_i) / sqrt(sum P_i^2) sqrt(sum Q_i^2)))
Jaccard = 1 - sum (P_i Q_i) / (sum P_i^2 + sum Q_i^2 - sum (P_i Q_i))
('bhattacharyya', 'squared_chord')
Bhattacharyya = - ln sum sqrt(P_i Q_i)
Squared-chord = sum ( sqrt(P_i) - sqrt(Q_i) )^2
('squared_chi')
Squared-Chi = sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
('kullback-leibler', 'jeffreys', 'jensen-shannon', 'jensen_difference')
Kullback-Leibler = sum P_i * log(P_i / Q_i)
Jeffreys = sum (P_i - Q_i) * log(P_i / Q_i)
Jensen-Shannon = 0.5(sum P_i ln(2P_i / (P_i + Q_i)) + sum Q_i ln(2Q_i / (P_i + Q_i)))
Jensen difference = sum (0.5(P_i log(P_i) + Q_i log(Q_i)) - 0.5(P_i + Q_i) ln(0.5(P_i + Q_i))
('hamming', 'mismatch', 'mismatchZero', 'binary')
Hamming = (# coordinates where P_i != Q_i) / n
Mismatch = # coordinates where that P_i != Q_i
MismatchZero = Same as mismatch but after removing the coordinates where both vectors have zero.
Binary = (# coordinates where a vector has 0 and the other has a non-zero value) / n.
('taneja', 'kumar-johnson', 'avg')
Taneja = sum ( P_i + Q_i / 2) log( P_i + Q_i / ( 2 sqrt( P_i * Q_i)) )
Kumar-Johnson = sum (P_i^2 - Q_i^2)^2 / 2 (P_i Q_i)^1.5
Avg = 0.5 (sum | P_i - Q_i| + max | P_i - Q_i |)
A matrix with the computed dissimilarity values.
Sung-Hyuk Cha (2007). International Journal of Mathematical Models and Methods in Applied Sciences. Issue 4, vol. 1
Luczac et al. (2019). Briefings in Bioinformatics 20: 1222-1237.
https://r-snippets.readthedocs.io/en/latest/real_analysis/metrics.html
vcos(), vdis()
metrics(matrix(1:9, ncol =3), 'cosine')
metrics(matrix(1:9, ncol =3), 'cosine')
Carries out a MSA of a set of different orthologous proteins in different species.
msa.merge(data, outfile = 'any')
msa.merge(data, outfile = 'any')
data |
input data must be a dataframe where each row corresponds to a protein and each column to a species. |
outfile |
path to the place where a fasta file is going to be saved. If 'any', no file is saved. |
The input data has the same format that the input data used for EnvNJ or SVD-n-Gram methods. Thus, the name of columns must correspond to that of species.
A dataframe containing the MSA (species x position).
msa.tree
## Not run: data(bovids) msa.merge(bovids) ## End(Not run)
## Not run: data(bovids) msa.merge(bovids) ## End(Not run)
Infers a tree base on a MSA.
msa.tree(data, outgroup = 'any')
msa.tree(data, outgroup = 'any')
data |
input data must be a dataframe where each row corresponds to a protein and each column to a species. |
outgroup |
when a rooted tree is desired, indicate the species to be used as outgroup. |
The input data has the same format that the input data used for EnvNJ or SVD-n-Gram methods. Thus, the name of columns must correspond to that of species.
A list containing the (i) MSA, (ii) the distance matrix and (iii) the tree.
msa.merge
## Not run: data(bovids) msa.tree(bovids) ## End(Not run)
## Not run: data(bovids) msa.tree(bovids) ## End(Not run)
Computes normalized compression distances.
ncd(seq1, seq2)
ncd(seq1, seq2)
seq1 |
character string indicating the path to the first fasta file to be analyzed. |
seq2 |
character string indicating the path to the second fasta file to be analyzed. |
The two fasta files must be in the working directory. This function use zpaq to compress files. Thus, the zpaq software must be installed on your system and in the search path for executables if you wish to use this function. NCD = (Z(xy) - min(Z(x), Z(y))) / max(Z(x), Z(y)) Where Z(x), Z(y) and Z(xy) are the lengths of the compressed versions of seq1, seq2 and the concatenated sequences 1 and 2, respectively.
A non-negative real value reflecting the dissimilarity between seq1 and seq2.
ncdnj()
try(ncd(seq1 = "./A.fasta", seq2 = "./B.fasta"))
try(ncd(seq1 = "./A.fasta", seq2 = "./B.fasta"))
Computes a distance matrix using normalized compression distance.
ncdnj(wd)
ncdnj(wd)
wd |
character string indicating the path to the directory where the input files can be found (see details). |
The input files, which must be found at the wd provided, consist of a file named 'list.txt' containing the names of the fasta files to be analyzed (one per line). The referred fasta files also must be found at the provided wd. This function use zpaq to compress files. Thus, the zpaq software must be installed on your system and in the search path for executables if you wish to use this function.
A list where the first element is a symmetric distance matrix and the second one is a phylogenetic tree build using NJ.
ncd()
try(ncdnj("./data_t"))
try(ncdnj("./data_t"))
Computes the n-gram frequencies vector for a given protein.
ngram(prot, k = 4)
ngram(prot, k = 4)
prot |
a character string corresponding to the primary structure of the protein. |
k |
a positive integer, between 1 and 5, indicating the k-mer of the words to be counted. |
The one letter code for amino acids is used (capital).
A dataframe with two columns, the first one given the peptides and the second one the corresponding absolute frequency.
Stuart et al. Bioinformatics 2002; 18:100-108.
ngraMatrix(), ffp(), svdgram()
ngram(bovids$Bos_taurus[1], k = 3)
ngram(bovids$Bos_taurus[1], k = 3)
Computes the n-gram frequencies dataframe for the protein and species provides.
ngraMatrix(data, k = 4, silent = FALSE)
ngraMatrix(data, k = 4, silent = FALSE)
data |
a dataframe with as many columns as species and one row per orthologous protein. The rows and columns must be named accordingly. |
k |
a positive integer, between 1 and 5, indicating the k-mer of the words to be counted. |
silent |
logical, set to FALSE to avoid loneliness. |
The argument prot can be obtained using orth() and orth.seq().
A list with two dataframes. The first one with nsp * npr columns (nsp: number of species, npr: number of proteins per species) and npe rows (npe: number of peptides, 20 for n = 1, 400 for n = 2, 8000 for n = 3 and 160000 for n = 4). The entries of the dataframe are the number of times that the indicated peptide has been counted in the given protein. Orthologous proteins are in consecutive columns, thus the first nsp columns are the orthologous of protein 1 and so on. The second dataframe contains the Species Vector Sums (each vector describes one species).
Stuart et al. Bioinformatics 2002; 18:100-108.
ngram(), svdgram()
ngraMatrix(bovids[,1:3], k = 2)
ngraMatrix(bovids[,1:3], k = 2)
Computes the matrix representing the species vector subspace.
otu.space(data, r = 10, aa = "all", silent = TRUE)
otu.space(data, r = 10, aa = "all", silent = TRUE)
data |
input data must be a dataframe (see details). |
r |
a positive integer indicating the radius of the sequence segment considered as environment. |
aa |
the amino acid(s) to be used to encoded the species. |
silent |
logical. If FALSE, the running progress is reported. |
Input data must be a dataframe where each row corresponds to an individual protein, and each column identifies a species. Therefore, the columns' names of this dataframe must be coherent with the names of the OTUs being analyzed. The dimension of the vector representing each species will depend on the settings. For instance, if we choose a single amino acid and a radius of 10 for the sequence environment, then we will get a vector of dimension 400 (20 amino acids x 20 positions). If we opt for the 20 amino acids and r = 10, then the vector will be of dimension 8000 (400 for each amino acid * 20 amino acids). Please, note that r is selected in the function env.sp() that will provide the input dataframe for the current function.
A matrix representing the species vector subspace.
env.sp(), otu.vector()
data(bovids) otu.space(bovids[, 1:5], r = 2)
data(bovids) otu.space(bovids[, 1:5], r = 2)
Converts a set of sequence environments into a vector.
otu.vector(envl, sp = "", aa = "all", silent = TRUE)
otu.vector(envl, sp = "", aa = "all", silent = TRUE)
envl |
a list containing the sequence environment of a species (as the one returned by the function env.sp()). |
sp |
character string indicating the species being analyzed. |
aa |
the amino acid(s) to be used to encoded the species. |
silent |
logical. When FALSE the program progress is reported to alleviate loneliness. |
The dimension of the vector representing the species will depend on the settings. For instance, if we choose a single amino acid and a radius of 10 for the sequence environment, then we will get a vector of dimension 400 (20 amino acids x 20 positions). If we opt for the 20 amino acids and r = 10, then the vector will be of dimension 8000 (400 for each amino acid * 20 amino acids). Please, note that r is selected in the function env.sp() that will provide the input dataframe for the current function.
A matrix representing the species. This matrix can be converted into a vector representing the target species just typing as.vector(matrix). Each coordinate is the frequency of a given amino acid at a certain position from the environment (see details).
env.sp(), otu.space()
data(bovids) cow = env.sp(bovids, "Bos_taurus") otu.vector(cow)
data(bovids) cow = env.sp(bovids, "Bos_taurus") otu.vector(cow)
A dataset containing the sequences of mtDNA-encoded proteins from different mammalian species (Reyes et al. 1996, Mol.Biol. Evol. 17:979)
reyes
reyes
A data frame with 13 rows (one per protein) and 34 variables (one per species)
https://www.ncbi.nlm.nih.gov/genome/organelle/
Computes phylogenetic trees using an n-gram and SVD approach.
svdgram(matrix, rank, species, SVS = TRUE)
svdgram(matrix, rank, species, SVS = TRUE)
matrix |
either a dataframe or a matrix where each row represents a property of a protein (for instance, the frequencies of tetrapeptides) and each column represents a different protein (or species). |
rank |
a numeric array providing the ranks that want to be used to approach the data matrix using SVD. |
species |
character array providing the species' names. |
SVS |
logical. When the matrix passed as argument correspond to the peptide-protein matrix and SVS is set to TRUE, then the function will compute a matrix where the columns are the Species Vector Sums. Alternatively, if the matrix passed as argument is already a matrix where the columns encode for species, SVS should be set to FALSE. |
When the matrix passed as argument is a matrix of peptide-protein, the function implement the method described by Stuart et al. 2002 (see references).
An object of class multiPhylo containing a tree for each rank value required.
Stuart et al. Bioinformatics 2002; 18:100-108.
ngraMatrix()
a <- ngraMatrix(bovids[, 1:4], k = 2)[[2]][, -1] species <- names(a) svdgram(matrix = a, rank = 4, species = species, SVS = FALSE)
a <- ngraMatrix(bovids[, 1:4], k = 2)[[2]][, -1] species <- names(a) svdgram(matrix = a, rank = 4, species = species, SVS = FALSE)
Computes pairwise cosines of the angles between vectors.
vcos(vectors, silent = TRUE, digits = 3)
vcos(vectors, silent = TRUE, digits = 3)
vectors |
a named list (or dataframe) containing n-dimensional vectors. |
silent |
logical, set to FALSE to avoid loneliness. |
digits |
integer indicating the number of decimal places. |
Cosines are standard measure of vector similarity. If the angle between two vectors in n-dimensional space is small, then the individual elements of their vectors must be very similar to each other in value, and the calculated cosine derived from these values is near one. If the vectors point in opposite directions, then the individual elements of their vectors must be very dissimilar in value, an the calculated cosine is near minus one.
A triangular matrix with the cosines of the angles formed between the given vectors.
vdis()
vcos(otu.space(bovids[, 1:4]))
vcos(otu.space(bovids[, 1:4]))
Computes pairwise distances between vectors.
vdis(cos)
vdis(cos)
cos |
a square upper triangular matrix where cos(i,j) is the cosine between the vector i and j. |
Cosines are standard measure of vector similarity, and can be converted into distance by dij = -log( (1 + cos(i,j) )/2).
A triangular matrix with the distances.
vcos()
data(bovids) vectors = otu.space(bovids[, 7:11]) cosData = vcos(vectors) disData = suppressWarnings(vdis(cosData))
data(bovids) vectors = otu.space(bovids[, 7:11]) cosData = vcos(vectors) disData = suppressWarnings(vdis(cosData))
Converts a set of vectors into a tree.
vect2tree(path, metric = "cosine", clustering = "nj")
vect2tree(path, metric = "cosine", clustering = "nj")
path |
path to the working directory. This directory must contain a txt file per vector and an additional txt file named vlist.txt that provides the names (one per line) of the vector txt files. |
metric |
character string indicating the metric (see metrics() to see the methods allowed). |
clustering |
string indicating the clustering method, either "nj" or "upgma". |
This function computes the distance matrix and builds the corresponding tree.
a list with two elements: a distance matrix and a tree.
envnj(), fastaconc(), envfascpp()
## Not run: vec2tree("./data_t")
## Not run: vec2tree("./data_t")
Builds a tree when species are encoded by n-dim vectors.
vtree(matrix, outgroup = 'any')
vtree(matrix, outgroup = 'any')
matrix |
either a dataframe or matrix where each column represents an OTU. |
outgroup |
when a rooted tree is desired, it indicates the species to be used as outgroup. |
The method is based on a distance matrix obtained after converting the cos between vector (similarity measurement) in a dissimilarity measurement.
A list with two objects, the first one is an inter-species distance matrix. The second one is an object of class 'phylo'.
svdgram
data(bovids) mymatrix <- ngraMatrix(bovids[, 6:11], k = 2)[[2]][, 2:7] vtree(mymatrix, outgroup = "Pseudoryx_nghetinhensis")
data(bovids) mymatrix <- ngraMatrix(bovids[, 6:11], k = 2)[[2]][, 2:7] vtree(mymatrix, outgroup = "Pseudoryx_nghetinhensis")