Splunk Field Extraction Walkthrough

Travis Hall
30 Aug 202226:48

Summary

TLDR本视频由Splunk的Travis主讲,他分享了自己如何使用Splunk进行字段提取的经验。Travis首先介绍了自己与Splunk的渊源,然后详细讲解了如何在Splunk中进行字段提取,包括使用Field Extractor和处理Debian包日志数据集的实例。他强调了使用props和transforms文件来优化数据解析过程,并提供了实用的技巧和建议,如使用regex101工具来辅助构建正则表达式。视频旨在帮助用户更好地理解和利用Splunk进行数据分析和可视化。

Takeaways

  • 📈 了解如何使用Splunk进行数据分析和可视化,特别是对于Linux系统日志的解析。
  • 🔍 学习了如何使用Splunk的字段提取器进行字段提取,尽管存在一些限制。
  • 🔧 掌握了使用正则表达式进行字段匹配和提取的方法。
  • 🛠️ 认识到了props和transforms文件在数据解析中的重要性和应用。
  • 📚 强调了Splunk文档和在线工具(如regex101)在学习正则表达式和数据解析中的作用。
  • 🎥 通过视频教程,可以更直观地学习Splunk的操作和最佳实践。
  • 🖥️ 了解了如何在Splunk中使用搜索和命令来查询和分析数据。
  • 🔄 讨论了如何通过修改props和transforms文件来改进数据解析。
  • 📊 提到了如何利用Splunk创建仪表板和可视化,以便更好地理解数据。
  • 🔗 介绍了Splunk社区资源,如gosplunk.com和Splunk Lantern,用于查找和共享查询和仪表板。
  • 💡 强调了持续学习和实践在提高Splunk使用技能中的重要性。

Q & A

  • Travis在Splunk中是如何帮助他人理解产品的?

    -Travis通过创建视频来展示他在Splunk中如何处理数据和问题,从而帮助他人更好地理解Splunk产品并提高其使用效率。

  • Travis提到的数据摄取是指什么?

    -数据摄取是指将数据引入到Splunk环境中的过程。Travis假设观众已经知道如何将数据导入Splunk,或者正在了解如何进行数据摄取。

  • 在Splunk中,字段提取的作用是什么?

    -字段提取可以帮助用户从原始数据中识别和创建新的字段,从而使得数据更易于分析和可视化,例如创建仪表板。

  • Travis在视频中使用了哪种方法来提取字段?

    -Travis使用了Splunk的内部字段提取器,选择了正则表达式路由而不是分隔路由,因为他的数据不适合使用分隔符。

  • Travis在处理Debian包日志时遇到了哪些挑战?

    -Travis在尝试提取Debian包日志中的特定字段时,遇到了Splunk字段提取器的局限性,导致无法一次性提取所有需要的字段。

  • Travis提到了哪些工具来帮助理解和创建正则表达式?

    -Travis提到了regex101这个在线工具,它提供了正则表达式的匹配信息、快速参考和解释,帮助用户更好地理解和创建正则表达式。

  • props和transforms文件在Splunk中的作用是什么?

    -props文件用于定义数据的来源和类型,而transforms文件包含一系列的操作,用于在数据到达索引器之前对其进行处理和转换,以便更好地解析和分析数据。

  • Travis如何测试和调试他的Splunk配置文件?

    -Travis通过编辑本地的props和transforms文件来进行测试和调试,并且在确认更改后,将这些更改推送到搜索头和通用转发器。

  • Travis提到了哪些Splunk社区资源?

    -Travis提到了gosplunk.com和splunklantern.com这两个网站,这些网站提供了查询、仪表板和其他Splunk相关内容,供用户学习和使用。

  • Travis在视频中提到了哪些Splunk的命令?

    -Travis提到了rex命令,以及如何使用stats、count和rename等命令来分析和呈现数据。

  • Travis在视频中创建了哪些字段?

    -Travis在视频中创建了名为action的字段,并成功提取了installed、unpacked、configured、startup、upgrade和remove等字段。

Outlines

00:00

📺 视频制作初衷与Splunk基础介绍

视频作者Travis介绍了自己使用Splunk的经验和加入公司后的工作经历。他强调了帮助他人理解和更好地使用Splunk产品的重要性,并提出了通过视频分享自己在Splunk中的操作方式的想法。本段落还简要介绍了关于字段提取的基础知识,包括数据的解析和转换为可视化报告的过程。

05:00

🔍 字段提取与正则表达式的应用

Travis详细讲解了在Splunk中如何进行字段提取,特别是使用正则表达式进行数据解析。他通过一个具体的数据集示例,展示了如何使用Splunk的字段提取工具来识别和提取关键信息。此外,他还提到了工具的局限性,并提供了如何通过查看和搜索Splunk文档来获取更多帮助的方法。

10:01

📝 使用props和transforms文件进行数据解析

Travis解释了如何使用Splunk的props和transforms文件来进一步细化数据解析过程。他通过分析Unix和Linux的addon文件,展示了如何通过props文件来重命名源类型,并通过transforms文件应用正则表达式模式来解析日志文件。他还强调了在编辑这些配置文件时避免直接修改默认文件夹的重要性。

15:01

🔧 正则表达式的构建与测试

在本段落中,Travis通过实际演示如何在regex101网站上构建和测试正则表达式。他详细说明了如何捕获特定的文本组,并展示了如何将这些表达式应用于Splunk的transforms文件中,以便更精确地解析日志数据。

20:03

📂 文件权限与Splunk配置的管理

Travis讨论了在Linux环境下管理Splunk配置文件的重要性。他解释了如何创建和编辑props和transforms文件,并强调了文件权限管理的重要性,确保Splunk用户拥有正确的文件访问权限。此外,他还提到了如何通过Splunk搜索来验证数据解析的结果。

25:04

🌐 Splunk社区资源与数据可视化

视频的最后部分,Travis向观众介绍了Splunk社区资源,如gosplunk.com和splunk lantern,这些网站提供了丰富的查询和仪表板模板。他还展示了如何使用Splunk的搜索和统计命令来创建可视化报告,并鼓励观众通过评论和Splunk社区与他交流。

Mindmap

Keywords

💡Splunk

Splunk 是一种大数据分析平台,用于搜索、监控和分析机器生成的大数据,通过该平台可以提取有意义的业务洞察。在视频中,Travis 讨论了他如何使用 Splunk 进行数据解析和可视化。

💡字段提取

字段提取是将文本数据分解成更小的部分或字段的过程,以便更容易地进行搜索和分析。在视频中,Travis 讨论了如何在 Splunk 中使用字段提取器来识别和创建新的字段,例如从日志文件中提取操作类型和软件包名称。

💡正则表达式

正则表达式是一种强大的文本搜索和模式匹配工具,它允许用户定义搜索模式来匹配特定的文本序列。在视频中,Travis 使用正则表达式来精确地提取日志文件中的特定信息。

💡props.conf

props.conf 是 Splunk 中的一个配置文件,用于定义数据的属性,如时间格式、字段提取规则等。在视频中,Travis 讨论了如何编辑 props.conf 文件来改变数据的解析方式。

💡transforms.conf

transforms.conf 是 Splunk 中的一个配置文件,用于定义数据转换规则,这些规则可以在数据索引之前应用,以改变或添加字段。在视频中,Travis 讨论了如何使用 transforms.conf 来进一步处理和解析数据。

💡数据可视化

数据可视化是将数据以图形或图表的形式展现出来,以便用户可以更容易地理解和分析数据。在视频中,Travis 提到了创建仪表板和可视化,以便更好地展示和理解解析后的数据。

💡Linux 系统日志

Linux 系统日志是 Linux 操作系统产生的日志文件,记录了系统事件和错误信息。在视频中,Travis 专注于分析 Linux 系统日志,特别是 Debian 和 Ubuntu 系统的包管理日志。

💡数据解析

数据解析是指将原始数据转换成结构化数据的过程,这通常涉及到识别和提取重要的信息。在视频中,Travis 讨论了如何通过 Splunk 的字段提取和 props.conf 文件来解析 Linux 系统日志。

💡仪表板

仪表板是数据可视化的一种形式,通常包含多个图表和指标的集合,用于展示关键性能指标(KPIs)和其他重要数据。在视频中,Travis 提到了创建仪表板来展示解析后的数据,以便用户可以更直观地理解数据。

💡搜索和报告

搜索和报告是 Splunk 中用于查询和展示数据的功能。用户可以通过搜索来找到特定的数据,然后使用报告和仪表板功能将这些数据以可视化的形式展现出来。在视频中,Travis 讨论了如何使用 Splunk 的搜索功能来查询解析后的数据,并计划创建报告和仪表板。

Highlights

Travis分享了他在Splunk中的工作经验,包括如何使用Splunk进行数据分析和可视化。

Travis自2009年以来一直使用Splunk,并在2017年加入公司,他热衷于帮助他人理解Splunk产品。

视频主要讲解了如何在Splunk中进行字段提取,这是数据分析的重要步骤。

Travis提到了他在处理Linux系统日志时的挑战,特别是Debian包日志。

介绍了Splunk的字段提取工具,以及如何使用正则表达式进行数据解析。

讨论了Splunk字段提取工具的局限性,并提供了解决这些问题的方法。

Travis展示了如何使用Splunk的rex命令和props.conf文件来改进数据解析。

解释了如何使用Splunk的transforms.conf文件来进一步定制数据解析规则。

提供了一个实际的例子,说明如何使用在线工具regex101来学习和构建正则表达式。

讨论了如何在Splunk中创建和使用自定义应用程序来组织配置文件。

强调了在编辑配置文件时避免在默认文件夹中直接编辑的重要性,以防止升级时被覆盖。

提供了关于如何使用Splunk的搜索和报告功能来创建仪表板的见解。

介绍了如何通过Splunk社区和网站发现和利用其他人创建的查询和仪表板。

展示了如何使用Splunk的统计命令和字段来生成有关软件包安装和系统活动的报告。

视频结束时,Travis鼓励观众提问并参与Splunk社区,以便进一步学习和交流。

Transcripts

00:01

hi travis with splunk here

00:03

wanted to create some videos around how

00:06

does travis do you know stuff in splunk

00:09

i've been using splunk since 2009

00:12

joined splunk back in 2017 and i really

00:15

enjoy helping others understand our

00:17

product and

00:18

how to get you know better with it and

00:20

become more you know make it more useful

00:23

so i thought what better way but

00:25

create some videos on how i do stuff it

00:28

may not be you know

00:31

the best way of doing it there's

00:32

probably somebody smarter that looks at

00:34

and go oh you could do it this way but

00:35

this is how i do stuff in splunk

00:39

in this video i want to talk about field

00:41

extractions so i'm going to assume that

00:43

you've already ingested the data

00:46

or you know you're bringing data into

00:47

your splunk environment

00:49

and you're looking at it going what do i

00:51

do now

00:52

and that's actually the series you know

00:54

let's let's how do we parse the data

00:56

i'll probably have another video about

00:58

how do i actually bring the data in and

01:00

then how do i turn to report turn into

01:02

dashboards but

01:03

let's not get ahead of ourselves here

01:05

and we're talking about field

01:07

extractions and you know i have a data

01:10

set here i was actually working on

01:12

another project

01:13

around compliance and understanding hey

01:16

what is being installed

01:18

on my endpoints

01:20

you know windows linux unix and i

01:22

decided i want to focus on linux

01:24

ubuntu and debian package log

01:27

i do have yum logs coming in i have a

01:29

centos box but i'm going to talk about

01:31

debian package log

01:33

and

01:34

here is the

01:36

the raw data i have hose source and

01:39

source type but i don't really have any

01:40

other fields

01:42

that i think i should have

01:44

and what i mean

01:46

here is i've got a status i've got

01:49

a startup i've got remove i have have

01:52

configured i have installed

01:55

i would like to see something over here

01:57

to help me

01:58

to be able to create

02:00

maybe uh

02:02

dashboards and visualizations around it

02:05

so i'm going to

02:06

expand this go to event actions and use

02:09

splunk's internal extract you know field

02:11

extractor

02:13

so click extract fields

02:15

bring you to the

02:17

field extractor here and this has gotten

02:19

a lot better since 2009 but it's you

02:21

know there's still some limitations and

02:23

you'll see that here in a second

02:26

i'm going to go regular expression

02:27

routes and not delimited

02:30

you know there is other structured data

02:32

it's maybe spaced out by commas

02:35

or

02:37

spaces or any kind of special character

02:40

that you could go delimited route but i

02:42

can't with this

02:44

data set

02:46

so i'ma click next

02:49

and in here you'll see the event that i

02:51

highlighted or that i selected

02:53

and

02:54

you want to highlight the word you don't

02:56

want to double click and the reason why

02:58

if you double click you may get a space

03:01

behind the word

03:02

and you don't want that

03:04

so we're going to make sure we highlight

03:06

the word

03:07

and now we can say you know give it

03:09

whatever field name that you like i'm

03:11

going to call it action

03:14

um

03:15

and you can see that

03:18

here we have

03:19

matching

03:21

it's getting

03:22

what's

03:23

you know

03:24

from the onset seems like oh yeah it's

03:26

working out great

03:28

you know if i click on non-matches it

03:29

matched everything well it's a you know

03:32

i can show you the regular expression

03:34

and it's a very simple regex

03:37

and let's go back to all events

03:39

and what i mean by that

03:41

here's startup and then it

03:43

grabbed packages i didn't grab startup

03:45

startups where i really want but because

03:47

status is here it did grab config but

03:49

didn't grab files which config dash

03:51

files all word

03:53

half dash installed i want that as all

03:55

one so i can already in linux wait

03:58

that's part of the the package

04:00

maybe we can fix the linux or maybe

04:02

maybe

04:03

we you know click on remove

04:06

and what it it's very quick it'll add it

04:08

up here

04:09

and then i can actually highlight remove

04:13

you know select the same field name if

04:15

you had multiple ones already built you

04:16

know you you know click the down box

04:18

there and go

04:21

and this is where limitations of our

04:24

field extraction utility comes in and it

04:26

breaks

04:28

so at this point

04:30

you know i'm kind of left with i can try

04:33

and do something you know

04:36

simpler

04:37

you know maybe not be i'm going to

04:39

remove this field

04:41

and maybe i want to go back here the

04:42

word status

04:44

the whole word status

04:47

and then uh

04:49

action and run that extraction

04:52

you know and would i be happy with that

04:54

and then try to

04:56

you know match the rest

04:58

but

04:59

you know if i

05:00

i don't want just status because i want

05:02

half installed and half configured

05:04

you know we can also take

05:06

the side note

05:08

you can take whatever regular expression

05:10

that you've you know splunk's built here

05:13

and test it

05:14

there's couple there's two different

05:15

ways we can click view and search

05:18

which will open up another search and

05:21

have it there

05:22

but i like to just go back to my

05:25

original search

05:27

and

05:29

pipe

05:31

rex

05:32

and if you've never used the rex command

05:36

you're new to splunk

05:38

um

05:40

there's a lot of commands out there

05:42

and if you

05:44

need more help or more information

05:47

there's a couple different ways we can

05:48

go about it uh inside of splunk you you

05:52

can see right here i have the word help

05:54

and i can go to the you know

05:56

page or splunk documentation

05:59

to show us the command and and all the

06:02

other commands and examples and whatnot

06:06

or i can click more right here now if

06:09

you're not getting this kind of

06:11

information

06:12

go up to your username i'm administrator

06:15

click on preferences

06:17

and then spl editor

06:20

and click on full you may be on compact

06:24

for the search assistant and if you

06:25

liked when i hit the pipe it dropped a

06:27

new line you know just check you know

06:30

search auto format

06:32

so i'm gonna be set on full that hit

06:34

apply

06:35

and so when i hit a pipe it drops a new

06:37

line and now i have more information

06:39

when i do

06:41

my different commands

06:43

so i'm going to wrap this in quotes as

06:45

you will need to wrap it in quotes

06:49

and

06:50

paste what i have in there

06:52

and hit enter

06:55

and yeah during this video you're going

06:57

to get all my little

06:59

mistakes

07:00

you know me correcting myself

07:02

i don't like doing multiple videos or

07:05

you know you modif mesh them all

07:07

together i just do it all in one take so

07:09

there the search is

07:10

finished and we can see i have a new

07:12

field called action and i can see status

07:14

startup configure upgrade i mean i mean

07:16

good start

07:18

but it's not what i want and there is

07:20

another way that we can go

07:22

and really

07:24

you know fine-tune how we're going to

07:26

extract

07:27

and be able to parse each one of these

07:29

events and that's using the props and

07:31

transforms

07:33

so i'm going to clear this out here

07:35

and you could

07:37

you know go back

07:39

to you know google search

07:42

and say okay splunk documentation

07:45

transform splunk documentation props.com

07:49

um you could go in here and start

07:51

reading the different

07:54

you know documentation that we have

07:56

around

07:57

what props and transforms is and i'll

07:59

explain a little bit more here in a

08:01

second

08:02

here is our transforms our documentation

08:04

we have examples and then you can come

08:06

up here and see what all those different

08:08

pieces mean and what you can do

08:10

same thing with props you know

08:13

examples and you know what does all this

08:15

stuff really mean and if you want to

08:16

take that time or

08:18

to read all that

08:20

you know go at it

08:22

so

08:24

i need to i'm going to close this screen

08:26

out

08:27

and what i want to do now is switch to

08:29

another screen

08:32

and show you

08:34

you know from that

08:36

splunk

08:37

add-on for unix and linux you can go out

08:40

there download that file unzip it

08:43

and then start reverse engineering

08:45

tearing it apart

08:47

and this is you know what i've already

08:48

been working on for

08:50

you know this presentation here

08:52

but if i go to the props dot com

08:55

you know i open the one that's in the

08:56

default folder

08:58

side note do not edit anything in the

09:00

default folder especially if you're you

09:03

know live in your environment

09:05

you will want to make a copy of the

09:07

configuration file the comp file

09:10

you know make a copy of props.com make a

09:13

copy of transforms.com

09:16

or inputs.com and move it to a local

09:18

folder for example i have an inputs.com

09:21

file here

09:22

and i have it in a local folder

09:26

so this is where if i need to make any

09:28

adjustments or changes to how i'm

09:30

explaining here how i'm telling my

09:32

universal forwarders to you know what

09:34

data to send i go into the local copy

09:37

because if you edit anything in default

09:40

it will uh be overwritten if you do an

09:42

upgrade

09:45

so props here we have the

09:47

splunk

09:48

unix you know add-on for unix and linux

09:51

pro props.com file

09:53

you know i could do and that's what i

09:55

did you know when i was first figuring

09:57

out hey

09:58

this isn't helping me with my

10:00

my package log you know i brought up a

10:03

find

10:05

and then did the search and look for

10:08

anything

10:09

and can't find text

10:11

if i do syslog

10:13

i mean it will find

10:15

you know the syslog data there

10:17

or

10:18

you will find the reference to syslog

10:21

and that's what the props does if you

10:23

have a source or source type if you want

10:25

to rename your source

10:27

so or you want to rename the

10:30

the source type and what i mean by that

10:32

let me go back to my

10:34

windows 7 my chrome here

10:37

and if i look at source type i have

10:40

you know multiple different source types

10:44

so going back over here

10:47

i can see that hey if it's source and it

10:50

ends with syslog

10:53

call it source type equals syslog

10:56

and the reason why

10:58

our app is doing this is if i go back to

11:01

my inputs.com file

11:03

and if i do a search in here for var

11:06

log

11:08

you can see when i

11:10

i monitor var log the folder

11:13

so you can monitor

11:15

you know to a

11:17

log file or you can monitor a folder

11:20

and say hey

11:22

give me anything that's dot log give me

11:23

anything it's

11:24

ins and messages anything that's off and

11:28

don't send me this information

11:31

and there's i mean there's a couple of

11:33

different ways if you want to specify

11:35

var log then you can specify your source

11:37

type there like here in the bash history

11:40

you know root dot bash history and then

11:43

this is specified you know bash history

11:47

so since i am using the add-on

11:51

i'm getting all of these i'm not

11:53

specifying my source type in my

11:55

inputs.com file

11:56

so then that's why you have to use a

11:58

props to look at the hey it's this

12:00

source

12:01

rename the source type to syslog and

12:04

then look in syslog

12:07

the only

12:09

i mean the only drawback to this way

12:13

is any any new data this will work for

12:16

but any index data that you've already

12:18

indexed

12:21

you have to have some of these settings

12:23

out on your universal folder so it knows

12:26

that index you know when it's sent in

12:27

the data it's rewriting the source type

12:31

so this or when it first comes into the

12:33

indexer it writes the source type

12:36

so at search time i mean this doesn't

12:38

help

12:40

so if we want to build something at

12:41

search time you know this is where

12:44

you can

12:45

instead use and here's props and this is

12:48

what i just explained

12:50

you can say hey if you see source

12:54

and it ends at you know slash dpkg.log

13:00

apply

13:01

these actions to that source

13:04

and when you are editing or you're

13:06

creating this make sure you only have

13:07

three dots

13:09

um i accidentally put four in and then i

13:11

couldn't figure out for like a half an

13:12

hour

13:13

what was why my

13:15

props and transforms wasn't working then

13:17

i realized i put four dots instead of

13:20

three

13:21

so splunk is a little sensitive there

13:24

and then you see here you know dpkg

13:27

startup

13:28

if i highlight that one

13:30

you see how it matches up here

13:33

you know installed

13:36

it matches you know up here

13:38

installed and how did i create all of

13:40

these so this is transforms and this is

13:43

what

13:43

you know after the props goes hey

13:46

here's a source

13:48

go to this report and do these actions

13:52

and these actions are going to be hey

13:53

here's a regex pattern for installed i

13:57

need

13:57

this

13:59

raw event broken up like this

14:02

so how did i get and how did i figure

14:05

out this regex pattern

14:08

easy i went to regex101 so let me switch

14:10

over to that screen again

14:13

and you can come in here and you grab

14:15

your

14:17

event

14:18

and go to regex101 you can see i've

14:20

already copied

14:22

the event into this regex101 this is a

14:25

on the web

14:26

it's a great utility

14:29

especially for people who don't know

14:30

much about

14:32

regular expressions it gives you helpful

14:34

information quick references off to the

14:36

right and explanations

14:38

i said okay i want to my first capture

14:40

group

14:41

i want installed