I am performing an empirical study on deep learning libraries and therefore have collected issue reports from these deep learning libraries on GitHub. **PyTorch, Keras, Caffe ,scikit-learn, ApacheMXNet, Tensorflow**
I have used these keywords to extract the bug reports as I have found that these appear often in actual CVEs.
1. Buffer overflow – stack overflow, heap overflow, buffer overflow
1. Integer overflow – integer overflow, underflow
1. Out of bounds access – OOB, out-of-bounds
1. Segmentation fault – segfault, segmentation fault
1. Denial of Service – DOS, crash (only one that works)
1. Memory/Data corruption – memory corruption, data corruption
1. Type confusion – type confusion
1. Division by zero – divide by zero, division by zero
1. Incomplete validation – validation , incomplete validation, invalid validation
1. Null pointer dereference – null pointer, nullptr
1. Data leak – data leak, memory leak
1. Integer truncation – truncation
1. However, only Tensorflow has a detailed CVE database and the rest have no CVEs or vulnerability reports. I have been informed by my supervisor that I will need to draw a clear distinction between bugs and vulnerabilities and I have been asked to do some research on how to differentiate bugs from vulnerabilities and how to determine if a bug report is a vulnerability.
I would like to ask for help and advice on how to do so.
The ideas that I currently have to determine if a bug report is a vulnerability is
See if the bug report is exploitable by people
See if there are code fixes for the bug report – I have been advised to check if a bug report has a fix, it may be exploitable by people.
If you would like to see the dataset that I have, feel free to DM me.
If anyone has any advice, research papers or ideas on how to differentiate bugs from vulnerabilities, please feel free!