Title: | Parsing Semi-Structured Log Files into Tabular Format |
---|---|
Description: | Convert semi-structured log files (such as 'Apache' access.log files) into a tabular format (data.frame) using a standard template system. |
Authors: | Austin Nar |
Maintainer: | Austin Nar <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-10-29 03:15:30 UTC |
Source: | https://github.com/cran/tabulog |
List or parser classes provided 'out-of-the-box'. These can be used without further definition in any templates, or can be overridden.
default_classes(file = system.file("config/parser_classes.yml", package = "tabulog"), formatters = .default_formatters())
default_classes(file = system.file("config/parser_classes.yml", package = "tabulog"), formatters = .default_formatters())
file |
Yaml file of parser classes to load. Defaults to included package file. |
formatters |
Named list of formatter functions to be associated with parsers. Default formatters are provided for default parser classes |
Parser classes are provided for the following
ip: For matching ip addresses
quote: For matching any string quoted by double-quotes
url: For matching a standard http(s) url
int: For matching any integer
double: For matching any numeric value (including integers)
A named list of the default parser classes provided "out of the box". Users should not need to use this in their code, and is mostly used for use in other internal functions. It is only visible to users so they can call it and see what classes are available by default.
default_classes()
default_classes()
Format a parser
object for printing
## S3 method for class 'parser' format(x, ...)
## S3 method for class 'parser' format(x, ...)
x |
parser to be formatted |
... |
other arguments to be passed to |
# No name, default formatter format(parser('[0-9]+')) # Custom name and formatter format(parser('[0-9]+]', as.integer, name='int'))
# No name, default formatter format(parser('[0-9]+')) # Custom name and formatter format(parser('[0-9]+]', as.integer, name='int'))
Get or set the formatter for a parser
formatter(x) formatter(x) <- value
formatter(x) formatter(x) <- value
x |
parser |
value |
formatter function to be set |
The formatter attribute (should be a function) for the passed object
(usually a parser
object)
p <- parser('[0-9]+]') # Default formatter formatter(p) # Set formatter formatter(p) <- as.integer # Custom formatter formatter(p)
p <- parser('[0-9]+]') # Default formatter formatter(p) # Set formatter formatter(p) <- as.integer # Custom formatter formatter(p)
Get or set the name for a parser
name(x) name(x) <- value
name(x) name(x) <- value
x |
parser |
value |
Name to be set |
The name attribute (should be a character) for the passed object
(usually a parser
object)
p <- parser('[0-9]+]') # Default name (NULL) name(p) # Set name name(p) <- 'int' # Custom name name(p)
p <- parser('[0-9]+]') # Default name (NULL) name(p) # Set name name(p) <- 'int' # Custom name name(p)
Parse a log file with a provided template and a set of classes
parse_logs(text, template, classes = list(), ...) parse_logs_file(text_file, config_file, formatters = list(), ...)
parse_logs(text, template, classes = list(), ...) parse_logs_file(text_file, config_file, formatters = list(), ...)
text |
Character vector; each element a log record |
template |
Template string |
classes |
A named list of parsers or regex strings for use within the template string |
... |
Other arguments passed onto |
text_file |
Filename (or readable connection) containing log text |
config_file |
Filename (or readable connection) containing template file |
formatters |
Named list of formatter functions for use of formatting |
'template
should only be a template string, such as
'ip ip_address [date access_date]...'.
config_file
should be a yaml file or connection with the following fields
template: Template String
classes: Named list of regex strings for building classes
text
should be a character vector, with each element representing a
a log record
text_file
should be a file or connection that can be split (with readLines)
into a character vector of records
classes
should be a named list of parser objects, where names
match names of classes in template string, or a similarly
named list of regex strings for coercing into parsers
formatters
should be a named list of functions, where names
match names of classes in template string, for properly
formatting fields once they have been captured
A data.frame with each field identified in the template string as a column.
For each record in the passed text, the fields were extracted and formatted
using the parser objects in default_classes()
and classes
.
# Template string with two fields template <- '{{ip ipAddress}} - [{{date accessDate}}] {{int status }}' # Two simple log records logs <- c( '192.168.1.10 - [26/Jul/2019:11:41:10 -0500] 200', '192.168.1.11 - [26/Jul/2019:11:41:21 -0500] 404' ) # A formatter for the date field myFormatters <- list(date = function(x) lubridate::as_datetime(x, format = '%d/%b/%Y:%H:%M:%S %z')) # A parser class for the date field date_parser <- parser( '[0-3][0-9]\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\\+|\\-][0-9]{4}', myFormatters$date, 'date' ) # Parse the logs from raw data parse_logs(logs, template, list(date=date_parser)) # Write the logs and to file and parse logfile <- tempfile() templatefile <- tempfile() writeLines(logs, logfile) yaml::write_yaml(list(template=template, classes=list(date=date_parser)), templatefile) parse_logs_file(logfile, templatefile, myFormatters) file.remove(logfile) file.remove(templatefile)
# Template string with two fields template <- '{{ip ipAddress}} - [{{date accessDate}}] {{int status }}' # Two simple log records logs <- c( '192.168.1.10 - [26/Jul/2019:11:41:10 -0500] 200', '192.168.1.11 - [26/Jul/2019:11:41:21 -0500] 404' ) # A formatter for the date field myFormatters <- list(date = function(x) lubridate::as_datetime(x, format = '%d/%b/%Y:%H:%M:%S %z')) # A parser class for the date field date_parser <- parser( '[0-3][0-9]\\/[A-Z][a-z]{2}\\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\\+|\\-][0-9]{4}', myFormatters$date, 'date' ) # Parse the logs from raw data parse_logs(logs, template, list(date=date_parser)) # Write the logs and to file and parse logfile <- tempfile() templatefile <- tempfile() writeLines(logs, logfile) yaml::write_yaml(list(template=template, classes=list(date=date_parser)), templatefile) parse_logs_file(logfile, templatefile, myFormatters) file.remove(logfile) file.remove(templatefile)
Create or test for parser objects. These objects will be used by templates to identify a field within a log file.
parser(x, f, name = NULL) is.parser(x)
parser(x, f, name = NULL) is.parser(x)
x |
A regex string, a parser, or a list of either; Or object to be tested |
f |
A function to format the captured output, or a named list of such
functions if |
name |
An optional name for the parser |
Parser objects contain 3 things:
A regex expression that matches the given field
A 'formatter'; a function that will in some way modify the captured text
By default, this the identity function
(Optional) A name for the parser
parser
and its S3 methods coerce x
to a parser
object,
returning said parser object. is.parser
returns TRUE or FALSE
# Captures integers parser('[0-9]+') # Captures integers, cast to integers parser('[0-9]+', as.integer) # List of parsers, all named (inferred from list names), some with parsers parser( list( ip = '[0-9]{1,3}(\\.[0-9]{1,3}){3}', int = '[0-9]+', date = '[0-9]{4}\\-[0-9]{2}\\-[0-9]{2}' ), list(int = as.integer, date = as.Date) ) is.parser(parser('[0-9]+')) #TRUE is.parser(100) #FALSE
# Captures integers parser('[0-9]+') # Captures integers, cast to integers parser('[0-9]+', as.integer) # List of parsers, all named (inferred from list names), some with parsers parser( list( ip = '[0-9]{1,3}(\\.[0-9]{1,3}){3}', int = '[0-9]+', date = '[0-9]{4}\\-[0-9]{2}\\-[0-9]{2}' ), list(int = as.integer, date = as.Date) ) is.parser(parser('[0-9]+')) #TRUE is.parser(100) #FALSE
Print a parser
object. Underlying method uses cat
.
## S3 method for class 'parser' print(x, ...)
## S3 method for class 'parser' print(x, ...)
x |
parser to be printed |
... |
Other arguments; ignored |
x
, invisibly
# No name, default formatter print(parser('[0-9]+')) #Custom name and formatter print(parser('[0-9]+]', as.integer, name='int'))
# No name, default formatter print(parser('[0-9]+')) #Custom name and formatter print(parser('[0-9]+]', as.integer, name='int'))