Skip to content

Commit ab71aa9

Browse files
committed
updated readme and added perf stats
Signed-off-by: Mpho Mphego <mpho112@gmail.com>
1 parent 9cc0e55 commit ab71aa9

File tree

5 files changed

+246
-41
lines changed

5 files changed

+246
-41
lines changed

README.md

Lines changed: 209 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,203 @@
11
# Computer Pointer Controller
22

3-
*TODO:* Write a short introduction to your project
3+
| Details | |
4+
|-----------------------|---------------|
5+
| Programming Language: | Python 3.6+ |
6+
| Intel OpenVINO ToolKit: | 2020.2.120 |
7+
| Docker (Ubuntu OpenVINO pre-installed): | [mmphego/intel-openvino](https://hub.docker.com/r/mmphego/intel-openvino)|
8+
| Hardware Used: | Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz |
9+
| Device: | CPU |
410

5-
## Project Set Up and Installation
6-
*TODO:* Explain the setup procedures to run your project. For instance, this can include your project directory structure, the models you need to download and where to place them etc. Also include details about how to install the dependencies your project requires.
11+
In this project, I used an Intel® OpenVINO [Gaze Detection model](https://docs.openvinotoolkit.org/latest/_models_intel_gaze_estimation_adas_0002_description_gaze_estimation_adas_0002.html) to control the mouse pointer of my computer. Using the Gaze Estimation model to estimate the gaze of the user's eyes and change the mouse pointer position accordingly. This project demonstrates the ability of running multiple models in the same machine and coordinate the flow of data between those models.
12+
13+
## How It Works
14+
Used the InferenceEngine API from Intel's OpenVino ToolKit to build the project.
15+
16+
The gaze estimation model used requires three inputs:
17+
18+
- The head pose
19+
- The left eye image
20+
- The right eye image.
21+
22+
To get these inputs, use the three other OpenVino models model below:
23+
24+
- [Face Detection](https://docs.openvinotoolkit.org/latest/_models_intel_face_detection_adas_binary_0001_description_face_detection_adas_binary_0001.html)
25+
- [Head Pose Estimation](https://docs.openvinotoolkit.org/latest/_models_intel_head_pose_estimation_adas_0001_description_head_pose_estimation_adas_0001.html)
26+
- [Facial Landmarks Detection](https://docs.openvinotoolkit.org/latest/_models_intel_landmarks_regression_retail_0009_description_landmarks_regression_retail_0009.html).
27+
28+
### Project Pipeline
29+
Coordinate the flow of data from the input, and then amongst the different models and finally to the mouse controller. The flow of data looks like this:
30+
31+
![image](https://user-images.githubusercontent.com/7910856/87787550-1db1b580-c83c-11ea-9f21-5048c803bf5c.png)
732

833
## Demo
9-
*TODO:* Explain how to run a basic demo of your model.
34+
35+
![vide-demo](https://user-images.githubusercontent.com/7910856/87830451-50ca6800-c881-11ea-87cf-3943795a76e8.gif)
36+
37+
38+
### Gaze Estimates
39+
40+
![all](https://user-images.githubusercontent.com/7910856/87830436-47d99680-c881-11ea-8c22-6a0a7e17c78d.gif)
41+
42+
### Face Detection
43+
![face_Detection](https://user-images.githubusercontent.com/7910856/87830444-4a3bf080-c881-11ea-993a-7f76c979449f.gif)
44+
45+
### Facial Landmark Estimates
46+
![facial_landmarks](https://user-images.githubusercontent.com/7910856/87830446-4c05b400-c881-11ea-90a5-d1b80d984f01.gif)
47+
48+
### Head Pose Estimates
49+
![head_pose](https://user-images.githubusercontent.com/7910856/87830450-4f00a480-c881-11ea-9d0b-4b43316456a2.gif)
50+
51+
## Project Set Up and Installation
52+
53+
### Directory Structure
54+
```bash
55+
tree && du -sh
56+
.
57+
├── LICENSE
58+
├── main.py
59+
├── models
60+
│   ├── face-detection-adas-binary-0001.bin
61+
│   ├── face-detection-adas-binary-0001.xml
62+
│   ├── gaze-estimation-adas-0002.bin
63+
│   ├── gaze-estimation-adas-0002.xml
64+
│   ├── head-pose-estimation-adas-0001.bin
65+
│   ├── head-pose-estimation-adas-0001.xml
66+
│   ├── landmarks-regression-retail-0009.bin
67+
│   └── landmarks-regression-retail-0009.xml
68+
├── README.md
69+
├── requirements.txt
70+
├── resources
71+
└── src
72+
├── __init__.py
73+
├── input_feeder.py
74+
├── model.py
75+
└── mouse_controller.py
76+
77+
3 directories, 16 files
78+
37M .
79+
```
80+
81+
### Setup and Installation
82+
There are two (2) ways of running the project.
83+
1) Download and install [Intel OpenVINO Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) and install.
84+
- After you've cloned the repo, you need to install the dependecies using this command:
85+
`pip3 install -r requirements.txt`
86+
87+
2) Run the project in the [Docker image](https://hub.docker.com/r/mmphego/intel-openvino) that I have baked Intel OpenVINO and dependencies in.
88+
- Run: `docker pull mmphego/intel-openvino`
89+
90+
Not sure what Docker is, [watch this](https://www.youtube.com/watch?v=rOTqprHv1YE)
91+
92+
For this project I used the latter method.
93+
94+
#### Models Used
95+
I have already downloaded the Models, which are located in `./models/`.
96+
Should you wish to download your own models follow:
97+
98+
```bash
99+
MODEL_NAME=<<name of model to download>>
100+
docker run --rm -ti \
101+
--volume "$PWD":/app \
102+
mmphego/intel-openvino \
103+
bash -c "\
104+
/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name $MODEL_NAME"
105+
```
106+
Models used in this project:
107+
- [Face Detection Model](https://docs.openvinotoolkit.org/latest/_models_intel_face_detection_adas_binary_0001_description_face_detection_adas_binary_0001.html)
108+
- [Facial Landmarks Detection Model](https://docs.openvinotoolkit.org/latest/_models_intel_landmarks_regression_retail_0009_description_landmarks_regression_retail_0009.html)
109+
- [Head Pose Estimation Model](https://docs.openvinotoolkit.org/latest/_models_intel_head_pose_estimation_adas_0001_description_head_pose_estimation_adas_0001.html)
110+
- [Gaze Estimation Model](https://docs.openvinotoolkit.org/latest/_models_intel_gaze_estimation_adas_0002_description_gaze_estimation_adas_0002.html)
10111
11112
## Documentation
12-
*TODO:* Include any documentation that users might need to better understand your project code. For instance, this is a good place to explain the command line arguments that your project supports.
113+
114+
### Usage
115+
116+
```bash
117+
118+
$ python main.py -h
119+
120+
usage: main.py [-h] -fm FACE_MODEL -hp HEAD_POSE_MODEL -fl
121+
FACIAL_LANDMARKS_MODEL -gm GAZE_MODEL [-d DEVICE]
122+
[-pt PROB_THRESHOLD] -i INPUT [--out] [-mp [{high,low,medium}]]
123+
[-ms [{fast,slow,medium}]] [--enable-mouse] [--debug]
124+
[--show-bbox]
125+
126+
optional arguments:
127+
-h, --help show this help message and exit
128+
-fm FACE_MODEL, --face-model FACE_MODEL
129+
Path to an xml file with a trained model.
130+
-hp HEAD_POSE_MODEL, --head-pose-model HEAD_POSE_MODEL
131+
Path to an IR model representative for head-pose-model
132+
-fl FACIAL_LANDMARKS_MODEL, --facial-landmarks-model FACIAL_LANDMARKS_MODEL
133+
Path to an IR model representative for facial-
134+
landmarks-model
135+
-gm GAZE_MODEL, --gaze-model GAZE_MODEL
136+
Path to an IR model representative for gaze-model
137+
-d DEVICE, --device DEVICE
138+
Specify the target device to infer on: CPU, GPU, FPGA
139+
or MYRIAD is acceptable. Sample will look for a
140+
suitable plugin for device specified (CPU by default)
141+
-pt PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
142+
Probability threshold for detections filtering(0.8 by
143+
default)
144+
-i INPUT, --input INPUT
145+
Path to image or video file or 'cam' for Webcam.
146+
--out Write video to file.
147+
-mp [{high,low,medium}], --mouse-precision [{high,low,medium}]
148+
The precision for mouse movement (how much the mouse
149+
moves). [Default: low]
150+
-ms [{fast,slow,medium}], --mouse-speed [{fast,slow,medium}]
151+
The speed (how fast it moves) by changing [Default:
152+
fast]
153+
--enable-mouse Enable Mouse Movement
154+
--debug Show output on screen [debugging].
155+
--show-bbox Show bounding box and stats on screen [debugging].
156+
```
157+
### Example
158+
```shell
159+
xvfb-run docker run --rm -ti \
160+
--volume "$PWD":/app \
161+
--env DISPLAY=$DISPLAY \
162+
--volume=$HOME/.Xauthority:/root/.Xauthority \
163+
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
164+
--device /dev/video0 \
165+
mmphego/intel-openvino \
166+
bash -c "\
167+
source /opt/intel/openvino/bin/setupvars.sh && \
168+
python main.py \
169+
--face-model models/face-detection-adas-binary-0001 \
170+
--head-pose-model models/head-pose-estimation-adas-0001 \
171+
--facial-landmarks-model models/landmarks-regression-retail-0009 \
172+
--gaze-model models/gaze-estimation-adas-0002 \
173+
--input resources/demo.mp4 \
174+
--debug \
175+
--show-bbox \
176+
--enable-mouse \
177+
--mouse-precision high \
178+
--mouse-speed fast"
179+
```
180+
181+
### Packaging the Application
182+
We can use the [Deployment Manager](https://docs.openvinotoolkit.org/latest/_docs_install_guides_deployment_manager_tool.html) present in OpenVINO to create a runtime package from our application. These packages can be easily sent to other hardware devices to be deployed.
183+
184+
To deploy the application to various devices usinf the Deployment Manager run the steps below.
185+
Note: Choose from the devices listed below.
186+
187+
```bash
188+
DEVICE='cpu' # or gpu, vpu, gna, hddl
189+
docker run --rm -ti \
190+
--volume "$PWD":/app \
191+
mmphego/intel-openvino bash -c "\
192+
python /opt/intel/openvino/deployment_tools/tools/deployment_manager/deployment_manager.py \
193+
--targets cpu \
194+
--user_data /app \
195+
--output_dir . \
196+
--archive_name computer_pointer_controller_${DEVICE}"
197+
198+
```
199+
200+
13201
14202
## Benchmarks
15203
*TODO:* Include the benchmark results of running your model on multiple hardwares and multiple model precisions. Your benchmarks can include: model loading time, input/output processing time, model inference time etc.
@@ -23,5 +211,19 @@ This is where you can provide information about the stand out suggestions that y
23211
### Async Inference
24212
If you have used Async Inference in your code, benchmark the results and explain its effects on power and performance of your project.
25213
26-
### Edge Cases
27-
There will be certain situations that will break your inference flow. For instance, lighting changes or multiple people in the frame. Explain some of the edge cases you encountered in your project and how you solved them to make your project more robust.
214+
## Edge Cases
215+
- Multiple People Scenario: If we encounter multiple people in the video frame, it will always use and give results one face even though multiple people detected,
216+
- No Head Detection: it will skip the frame and inform the user
217+
218+
## Area of Improvement:
219+
- [Intel® VTune™ Profiler](https://software.intel.com/content/www/us/en/develop/tools/vtune-profiler/choose-download.html): Profile my application and locate any bottlenecks.
220+
- Gaze estimations: We could revisit the logic of detemining and calculating the coordinates as it is a bit flaky.
221+
- lighting condition: We might use HSV based pre-processing steps to minimize error due to different lighting conditions.
222+
223+
224+
## Reference
225+
226+
- [OpenCV Face Recognition](https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/)
227+
- [Tracking your eyes with Python](https://medium.com/@stepanfilonov/tracking-your-eyes-with-python-3952e66194a6)
228+
- [Real-time eye tracking using OpenCV and Dlib](https://towardsdatascience.com/real-time-eye-tracking-using-opencv-and-dlib-b504ca724ac6)
229+
- [Deep Head Pose](https://github.com/natanielruiz/deep-head-pose/blob/master/code/utils.py#L86+L117)

main.py

Lines changed: 28 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,8 @@
1-
"""
2-
USAGE
3-
4-
xhost +; docker run --rm -ti \
5-
--volume "$PWD":/app \
6-
--env DISPLAY=$DISPLAY \
7-
--volume="$HOME/.Xauthority":/root/.Xauthority \
8-
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
9-
--device /dev/video0 \
10-
mmphego/intel-openvino \
11-
bash -c "source /opt/intel/openvino/bin/setupvars.sh && \
12-
python main.py \
13-
--face-model models/face-detection-adas-binary-0001 \
14-
--facial-landmarks-model models/landmarks-regression-retail-0009 \
15-
--head-pose-model models/head-pose-estimation-adas-0001 \
16-
--gaze-model models/gaze-estimation-adas-0002 \
17-
--input resources/demo.mp4";
18-
"""
1+
#!/usr/bin/env python3
192

203
import argparse
4+
import time
5+
216
from loguru import logger
227

238
from src.input_feeder import InputFeeder
@@ -73,21 +58,21 @@ def arg_parser():
7358
help="Specify the target device to infer on: "
7459
"CPU, GPU, FPGA or MYRIAD is acceptable. Sample "
7560
"will look for a suitable plugin for device "
76-
"specified (CPU by default)",
61+
"specified (Default: CPU)",
7762
)
7863
parser.add_argument(
7964
"-pt",
8065
"--prob_threshold",
8166
type=float,
8267
default=0.8,
83-
help="Probability threshold for detections filtering" "(0.8 by default)",
68+
help="Probability threshold for detections filtering" "(Default: 0.8)",
8469
)
8570
parser.add_argument(
8671
"-i",
8772
"--input",
8873
required=True,
8974
type=str,
90-
help="Path to image or video file or 'cam' for Webcam.",
75+
help="Path to image, video file or 'cam' for Webcam.",
9176
)
9277
parser.add_argument(
9378
"--out", action="store_true", help="Write video to file.",
@@ -123,7 +108,6 @@ def arg_parser():
123108
action="store_true",
124109
help="Show bounding box and stats on screen [debugging].",
125110
)
126-
127111
return parser.parse_args()
128112

129113

@@ -153,9 +137,9 @@ def main(args):
153137
+ gaze_estimation._model_load_time
154138
) / 1000
155139
logger.info(f"Total time taken to load all the models: {model_load_time:.2f} secs.")
156-
140+
count = 0
157141
for frame in video_feed.next_frame():
158-
142+
count +=1
159143
predict_end_time, face_bboxes = face_detection.predict(
160144
frame, show_bbox=args.show_bbox
161145
)
@@ -202,17 +186,27 @@ def main(args):
202186
mouse_controller.move(gaze_vector["x"], gaze_vector["y"])
203187

204188
if args.debug:
189+
if face_bboxes:
190+
text = f"Face Detection Inference time: {predict_end_time:.3f} s"
191+
face_detection.add_text(
192+
text, frame, (15, video_feed.source_height - 80)
193+
)
194+
text = (
195+
f"Facial Landmarks Est. Inference time: "
196+
f"{facial_landmarks_pred_time:.3f} s"
197+
)
198+
facial_landmarks.add_text(
199+
text, frame, (15, video_feed.source_height - 60)
200+
)
201+
text = f"Head Pose Est. Inference time: {hp_est_pred_time:.3f} s"
202+
head_pose_estimation.add_text(
203+
text, frame, (15, video_feed.source_height - 40)
204+
)
205+
text = f"Gaze Est. Inference time: {gaze_pred_time:.3f} s"
206+
gaze_estimation.add_text(
207+
text, frame, (15, video_feed.source_height - 20)
208+
)
205209
video_feed.show(video_feed.resize(frame))
206-
text = f"Face Detection Inference time: {predict_end_time:.3f} s"
207-
face_detection.add_text(text, frame, (15, video_feed.source_height - 80))
208-
text = f"Facial Landmarks Est. Inference time: {facial_landmarks_pred_time:.3f} s"
209-
facial_landmarks.add_text(text, frame, (15, video_feed.source_height - 60))
210-
text = f"Head Pose Est. Inference time: {hp_est_pred_time:.3f} s"
211-
head_pose_estimation.add_text(
212-
text, frame, (15, video_feed.source_height - 40)
213-
)
214-
text = f"Gaze Est. Inference time: {gaze_pred_time:.3f} s"
215-
gaze_estimation.add_text(text, frame, (15, video_feed.source_height - 20))
216210

217211
video_feed.close()
218212

requirements.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,6 @@
1+
loguru==0.4.1
2+
matplotlib==3.2.2
3+
numpy==1.19.0
14
PyAutoGUI==0.9.50
5+
python3-xlib==0.15
6+
tqdm==4.47.0

resources/demo.mp4

1.83 MB
Binary file not shown.

src/model.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ def __init__(
6262
self._init_image_w = source_width
6363
self._init_image_h = source_height
6464
self.exec_network = None
65+
self.perf_stats = {}
6566
self.load_model()
6667

6768
def _get_model(self):
@@ -112,6 +113,9 @@ def predict(self, image, request_id=0, show_bbox=False, **kwargs):
112113
pred_result.append(
113114
self.exec_network.requests[request_id].outputs[output_name]
114115
)
116+
self.perf_stats[output_name] = self.exec_network.requests[
117+
request_id
118+
].get_perf_counts()
115119
predict_end_time = float(time.time() - predict_start_time) * 1000
116120
bbox, _ = self.preprocess_output(pred_result, image, show_bbox=show_bbox)
117121
return (predict_end_time, bbox)

0 commit comments

Comments
 (0)