✪ such scripts are normally custom codes. Thus, even if there is, which most likely is, you might need to perform time-consuming re-checks, re-tests, re-validations, just for dataset preparation.
✪ It may be rational if you use an already prepared dataset to initially design, test and validate your machine learning algorithms or systems, rather than you spend your time to convert and clean dataset(s). Then, if your algorithms worked, you can go back and prepare your own datasets, as an addition, as well, if necessary.
Using tshark in Linux, the Command line code used to extract the tcp conversation is
tshark -r -q -z conv,tcp >
The above command takes the pcap or dump file and looks for converstion list and filters tcp from it and writes to an output file in txt format, in this case, in the order of flow sizes.
Recently, we have used Benfords law to extract features and detected anomalies in network flows using machine learning. The article can be found at https://arxiv.org/pdf/1609.04214.pdf . The code is implemented in Matlab and is also available to download from the links in the paper.
A simple shell code for chunk processing a very large tcp-dump is attached as extract_tcpdump.zip.
Thanks for the information. So are you trying to replicate the dataset or are planning to find out new features which would be better than the existing ones?
We had similar requirements in our research and developed our own tools for this purpose.
If you know the attributes you want to extract, you can use pcap-processor(https://github.com/slgobinath/pcap-processor) to extract, pre-process and to write them to csv file or anywhere else.
My colleague @Nadun Rajasinghe has developed a framework to create KDD like datasets from pcap file. (https://github.com/nrajasin/Network-intrusion-dataset-creator) which may be useful for your research.
Hi Gobinath Loganathan . I tried running the (https://github.com/nrajasin/Network-intrusion-dataset-creator). But it is generating a lot of errors while installing requirements.txt.
Please open GitHub issues in the project itself. Nadun may help you to solve the problems there. I can help you with the pcap-processor. To generate CSV files using the pcap-processor, you can use the following command:
For pcap files you can use Tshark, for example the following line command will read INPUT.PCAP file and retrieve all fields with -e argument with the filter rule in -Y argument using separator "," (CSV) output to OUTPUT.CSV file: