Yashar Salami Thank u for the suggestions provided. Ya, I have tried initially with few text samples converted to ASCII of fixed size to test the algorithm for Avalanche effect and Entropy. Yet to check few more parameters. But i require more samples. So i thought, if i could get a dataset it would be better to generalize the results.
Günter Fahrnberger Not exactly. Maybe a text generator of fixed length and an ASCII equivalent of it. Just now got a video on this(Link attached). Yet to explore.
It is not recommended to use modified AES, there are a lot of possibilities to do mistakes. However if you still want to modify it and want to create a new algorithm I suggest some important aspects to consider:
a. You can modify AES built-in S-boxes. There are other "perfect" S-boxex that can be used without sacrificing security. See for example "Generation of AES S-boxes with Various Modulus and Additive Constant Polynomials and Testing their Randomization" by S. Das, J.K.M.S. Uz Zaman, R. Ghosh.
b. Do not use reduced-round AES. If you want to use more "cost-effective" algorithm use some lightweight cryptography block cipher variants that is approved by the crypto community (e.g: PRINCE, PRESENT, etc).
c. You can combine AES with a different algorithm to create a new algorithm. How to combine in a secure way? For example, in 1993 Maurer and Massey published an article "Cascade ciphers: The importance of being first" how to concatenate ciphers. This theory can be interesting and can be used to combine two ciphers and the new one is still as secure than the weakest one. (So if you combine your algorithm with AES in a proper way, you do not need to worry about the security of the new one)
If you design a new modified algorithm , and can be proved that the new algorithm is as least secure as the original AES, then no need entropy test, NIST randomness test, etc.. All security related tests that AES fulfill, the new algorithm will satisfy too.
Creating new ciphers is indeed a very interesting topic and can lead to new discoveries. Good luck for your work.
Günter Fahrnberger I am working in MATLAB software installed in Windows. However, with the help of the video https://www.youtube.com/watch?v=IWniw87D0Ss i have tried to generate a dataset.
Günter Fahrnberger Ya i felt that too. Of late, I saw few online random text generators and also Text to ASCII converters. I think that itself would be suffice.
You can generate a simple text collection dataset for practice.
Write a simple code to parse any document file (pdf / doc / epub).
Better to take any good book of 100 + pages and read it using python or any other language code you are familiar with.
Read every page of that book programmatically and extract first line of each page and save it.
At the end you have a dataset of number of line equal to number of pages in the book with random but authentic text extracted from the content of the book.