NAME

flow-tools - tools for saving, printing and otherwise analyzing CISCO Netflow logs


DESCRIPTION

Many CISCO products use Netflow processing to improve performance. Briefly, in netflow processing routing information for traffic is cached in a flow table so that future similar traffic can be routed/switched more efficiently. One of the side effects of Netflow processing is that you can collect logs of the flows that were created and use this information for traffic analysis. OSU has created a suite of tools that make it easy to collect and analyze these flow logs in various ways - we call this package flow-tools.


WHAT IS A FLOW?

Good question. Briefly, a flow is a bunch of similar traffic - same IP protocol type, same source and destination IP address, same TCP or UDP port number. Each flow record collected from a Netflow enabled router or switch represents one or more packets from the source to the destination. In addition to this information, flows also have other attributes, such as a starting and ending time, source and destination interface numbers, an OR of the TCP flags (for flows associated with TCP traffic), and octet and packet counts.

Flows are created when traffic is first seen which doesn't have an entry in the flow table. Flow table entries are removed under the following conditions:

No traffic for the flow has been seen in 15 seconds.
The flow is more than 30 minutes old (this timeout allows long lasting flows to show up in logging periodically, which is helpful).
For TCP traffic, at the end of the TCP session.
If the flow table fills, flows are prematurely removed at will.

When flows are removed from the table and flow logging is enabled, a record describing the flow is sent by UDP to a flow collector host. See flow-collect below.


A BRIEF DESCRIPTION OF THE FLOW-TOOLS COLLECTION

Most of the tools follow the UNIX pipeline style - they can read flow records on stdin, and write their results to stdout.

Several of the tools facilitate the collection and management of large amounts of flow traffic:

flow-collect listens for flow records sent from a Netflow enabled device and stores them in compressed form as files. You can optionally have flow-collect implement an expiration policy to keep your flow logs from consuming all available disk space.
flow-mirror is a simple wrapper for fetch that retrieves finished flow logs from one flow collector host to another. We use this to collect logs from certain routers for long term storage.
flow-expire can implement the same expiration policies as flow-collect. We use this to manage the flow logs on the archive host, which isn't running flow-collect.

Collected flows are useless unless you can examine them:

flow-cat concatenates flow logs together, sort of like cat.
flow-print prints the contents of a flow log in one of several human readable formats.
flow-filter, flow-search are used for data reduction. flow-filter allows you to filter flows by source and destination IP addresses (with masks, using CISCO style standard ACLs), AS numbers, ports, interfaces, IP protocol type and so on. flow-search is a useful script that makes it easy to do IP address ACL filtering on a large number of log files.
flow-stat generates various statistical summaries of what it finds in a flow log.

Debug level 1 or above will produce flows processed per second stats to stderr. This is useful for deciding what compression level to use or how big a box to get for post processing.

flow-receive, flow-capture, and flow-filter allow updating of the comment field with the -C option. This can be viewed with the -p option to flow-print.


HOW DOES OSU COLLECT FLOWS?

Most of our backbone routers have Netflow processing enabled. Each router directs its flow records to a nearby flow collector host. The flow collector hosts are typically located in the same wiring closet, to avoid passing the flow traffic across the backbone (not that that's a huge problem). Each collector has several tens of gigabytes of disk space dedicated to storing flow logs. We typically aim for storing several days or a weeks worth of flows per router on these collector hosts. Flow are collected using the flow-collect program. We use the command line options to set expiration policies and working directories for the logs, and we typically set it up to store the flows in chunks of 15 minutes per file.

We also like to keep a longer history of flow data for the routers that connect us to the outside world. This history is invaluable for traffic analysis, incident response, and so on. We have a central flow collector host with several hundred gigabytes of disk space, and we store roughly 150 days worth of flows for each of our routers that connect us to the outside world (including peer links to local ISPs and links to our branch campuses). The logs are copied from the ``backbone'' flow collectors using flow-mirror, and the archives are expiration policy is implemented using flow-expire.


WHAT DO WE DO WITH THE FLOWS?

We use the flows for many things.


Network Engineering

We use the flows for debugging, traffic analysis, and general planning (among other things). For instance, we can answer questions like ``how much of our Internet traffic goes to/comes from other CIC schools?'', or ``how much of the Internet traffic is web, nntp, smtp, and so on?'' We can also assess the success of initiatives to reduce our Internet bandwidth.


Billing

We are moving toward usage based billing for a variety of reasons that aren't important here. The flow logs facilitate this in a number of ways.

Given a list of which departments use which IP address ranges (and we do have such a database), we can tally up the amount of traffic (incoming, outgoing, total octets and packets, as raw numbers and percent of the total) for each department and run it through some sort of billing system.

Once we start billing, we expect to have to do a fair amount of bill dispute resolution - we'll be able to use the flows to show units exactly where their major traffic areas are, and help them reduce their bandwidth consumption.


Security

We use the flows most of all for computer security practices at OSU. For instance, if we receive a complaint about network activity stemming from OSU, in many cases we can use the flow logs to confirm whether in fact the alleged activity occurred from OSU (it might have been spoofed, after all :-), and if so, where else they went, how they broke into the OSU hosts (if they came in through the network connection) and so on.

We also do some limited intrusion detection sorts of things with the flows. See flow-dscan for details.

We have occasionally helped groups on campus design firewalls. When we consult with a group, we can use the flow logs to get a better understanding of the activity to and from the network. =head1 INTERPRETTING FLOWS

There are some issues that you have to keep in mind when you try to read through flow logs and interpret them.

For one thing, note that a flow is not a connection or a session, at least not in the TCP sense of the word. A flow is a bunch of traffic that is similar, no more. A TCP connection will always consist of at least two flows - one for traffic from the client to the server, the other for traffic from the server to the client. Note that a TCP connection can (and often will) consist of many flows. This is especially true if the flow of packets pauses for 15 seconds or more (since the flows will timeout then), or if the connection lasts longer than 30 minutes, or if the flow table keeps filling.

We prefer to view the IP addresses and TCP and UDP ports as numbers, rather than converting them to names. If you choose to view them as names, keep in mind that the flow data doesn't contain the names, and the names can be misleading. For instance, the names that you resolve the IP addresses into may be incorrect due to cache poisoning or domain hi-jacking. A flow of TCP traffic with a destination port of 80 is probably traffic to a service on port 80 on some host, and is most likely a web server on that host, but you don't know that it is a web server until you've independently confirmed it, and you can't tell from that single flow alone whether it represents traffic to a server on port 80, or traffic from some other server to a client that's using port 80 (though this is unlikely, since clients on UNIX hosts at least usually use ports > 1024).

It can be difficult to infer which of the two hosts is the client and which is the server. In practice you should be able to do this by examining the flows in both directions and looking for the initial traffic - that's the client, contacting the server. flow-connect can help here.

When you examine the flows, note that they appear in order of the ending time of the flow, since the flow records are created when the flow is removed from the table. This can often obscure the client server/relationships we discussed in the previous paragraph. It can also cause confusion in other ways. Suppose someone uses telnet to login through the network to a host, and once there, reads email from a POP server and browses the web for a while. When you examine the flows, you would expect to see some telnet activity to and from the host, and interleaved with that flows for the POP and web activity. However, if the telnet connection is ``busy'' and the traffic doesn't pause long enough for the flows to time out, you'll actually see the flows for the POP and web activity up to 30 minutes before any of the flows for the telnet activity show up. Never forget that the flow records are created when the flow ends, so there's a time delay between the start of the activity and the appearance of the corresponding flow record. You can sort flows into chronological order by the starting time of the flow by either sorting the output of flow-print or by using the flow-sort program.

If your network configuration contains multiple paths to the hosts whose traffic you are examining, you should beware the possibility of assymetric routing and the effect that this will have on the flows. In brief, traffic from host A to host B might pass through router 1, and return traffic from B to A might pass through our other border router, router 2. In that case, the outbound traffic will appear in flows from router 1, but router 1 will have no flows for traffic from B to A (inbound traffic). To get both the inbound and outbound traffic, you would have to collect flows from both routers and merge the flows together. We haven't written useful tools to do that yet :-)

Note of course that tunneled traffic (through PPTP, GRE, and even SMB and SSH) ``disappears'' under the higher level tunneled layer (e.g., the connections you've forwarded through a SSH connection will appear as parts of the flow for the SSH connection, rather than separately).

ICMP type/code are encoded in the destination port field (one byte each).

ICMP can be difficult to tie to its cause. For ``normal'' ICMP denoting errors, you can often infer the traffic that caused the ICMP error from the source IP address of the ICMP flow - look for traffic going in the reverse direction. The difficulty arises in that the IP and TCP/UDP headers from the ``offending'' traffic are not included in the flow records.

Of course many attacks (probes, denial of service attacks, exploits) are perpetrated by sending strange, ``unexpected'' traffic to a target. Consequently, the ICMP ``destination unreachable (port)'' packets that you see in the flow logs might not be in response to traffic - they might be a probe/denial of service attack/back door communications.

Beware spoofed traffic in all its forms.


RELATED WORK

caida brumley? other guy?


SEE ALSO

flow-capture(1), flow-cat(1), flow-connect(1), flow-dscan(1),flow-expire(1), flow-export(1), flow-fanout(1), flow-filter(1), flow-gen(1), flow-interfaces(1), flow-print(1), flow-profile(1), flow-receive(1), flow-search(1), flow-send(1), flow-sort(1), flow-stat(1),